@@ -49,8 +49,6 @@
Apache Kyuubi™ is a distributed and multi-tenant gateway to provide serverless
SQL on data warehouses and lakehouses.
-
-
## What is Kyuubi?
Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for end-users to manipulate large-scale data with pre-programmed and extensible Spark SQL engines. This "out-of-the-box" model minimizes the barriers and costs for end-users to use Spark at the client side. At the server-side, Kyuubi server and engines' multi-tenant architecture provides the administrators a way to achieve computing resource isolation, data security, high availability, high client concurrency, etc.
@@ -84,7 +82,7 @@ HiveServer2 can identify and authenticate a caller, and then if the caller also
Kyuubi extends the use of STS in a multi-tenant model based on a unified interface and relies on the concept of multi-tenancy to interact with cluster managers to finally gain the ability of resources sharing/isolation and data security. The loosely coupled architecture of the Kyuubi server and engine dramatically improves the client concurrency and service stability of the service itself.
-#### DataLake/LakeHouse Support
+#### DataLake/Lakehouse Support
The vision of Kyuubi is to unify the portal and become an easy-to-use data lake management platform. Different kinds of workloads, such as ETL processing and BI analytics, can be supported by one platform, using one copy of data, with one SQL interface.
@@ -105,11 +103,7 @@ and others would not be possible without your help.
![](./docs/imgs/kyuubi_ecosystem.drawio.png)
-## Online Documentation
-
-Since Kyuubi 1.3.0-incubating, the Kyuubi online documentation is hosted by [https://kyuubi.apache.org/](https://kyuubi.apache.org/).
-You can find the latest Kyuubi documentation on [this web page](https://kyuubi.readthedocs.io/en/master/).
-For 1.2 and earlier versions, please check the [Readthedocs](https://kyuubi.readthedocs.io/en/v1.2.0/) directly.
+## Online Documentation
## Quick Start
@@ -117,9 +111,32 @@ Ready? [Getting Started](https://kyuubi.readthedocs.io/en/master/quick_start/) w
## [Contributing](./CONTRIBUTING.md)
-## Contributor over time
+## Project & Community Status
-[![Contributor over time](https://contributor-graph-api.apiseven.com/contributors-svg?chart=contributorOverTime&repo=apache/kyuubi)](https://api7.ai/contributor-graph?chart=contributorOverTime&repo=apache/kyuubi)
+
## Aside
@@ -127,7 +144,3 @@ The project took its name from a character of a popular Japanese manga - `Naruto
The character is named `Kyuubi Kitsune/Kurama`, which is a nine-tailed fox in mythology.
`Kyuubi` spread the power and spirit of fire, which is used here to represent the powerful [Apache Spark](http://spark.apache.org).
Its nine tails stand for end-to-end multi-tenancy support of this project.
-
-## License
-
-This project is licensed under the Apache 2.0 License. See the [LICENSE](./LICENSE) file for details.
diff --git a/bin/kyuubi-zk-cli b/bin/kyuubi-zk-cli
index 089b7ad186c..f503c3e5a5e 100755
--- a/bin/kyuubi-zk-cli
+++ b/bin/kyuubi-zk-cli
@@ -17,7 +17,7 @@
#
## Zookeeper Shell Client Entrance
-CLASS="org.apache.zookeeper.ZooKeeperMain"
+CLASS="org.apache.kyuubi.shaded.zookeeper.ZooKeeperMain"
export KYUUBI_HOME="$(cd "$(dirname "$0")"/..; pwd)"
diff --git a/bin/load-kyuubi-env.sh b/bin/load-kyuubi-env.sh
index bfb92265869..4d6f72ddf3e 100755
--- a/bin/load-kyuubi-env.sh
+++ b/bin/load-kyuubi-env.sh
@@ -69,6 +69,44 @@ if [[ -z ${JAVA_HOME} ]]; then
fi
fi
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS -XX:+IgnoreUnrecognizedVMOptions"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS -Dio.netty.tryReflectionSetAccessible=true"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/java.lang=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/java.lang.invoke=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/java.lang.reflect=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/java.io=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/java.net=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/java.nio=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/java.util=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/java.util.concurrent=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/sun.nio.ch=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/sun.nio.cs=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/sun.security.action=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/sun.security.tools.keytool=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/sun.security.x509=ALL-UNNAMED"
+KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS --add-opens=java.base/sun.util.calendar=ALL-UNNAMED"
+export KYUUBI_JAVA_OPTS="$KYUUBI_JAVA_OPTS"
+
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS -XX:+IgnoreUnrecognizedVMOptions"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS -Dio.netty.tryReflectionSetAccessible=true"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/java.lang=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/java.lang.invoke=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/java.lang.reflect=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/java.io=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/java.net=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/java.nio=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/java.util=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/java.util.concurrent=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/sun.nio.ch=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/sun.nio.cs=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/sun.security.action=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/sun.security.tools.keytool=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/sun.security.x509=ALL-UNNAMED"
+KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS --add-opens=java.base/sun.util.calendar=ALL-UNNAMED"
+export KYUUBI_CTL_JAVA_OPTS="$KYUUBI_CTL_JAVA_OPTS"
+
export KYUUBI_SCALA_VERSION="${KYUUBI_SCALA_VERSION:-"2.12"}"
if [[ -f ${KYUUBI_HOME}/RELEASE ]]; then
diff --git a/build/dist b/build/dist
index e0dae3479b8..b81a2661ece 100755
--- a/build/dist
+++ b/build/dist
@@ -215,7 +215,7 @@ else
echo "Making distribution for Kyuubi $VERSION in '$DISTDIR'..."
fi
-MVN_DIST_OPT="-DskipTests"
+MVN_DIST_OPT="-DskipTests -Dmaven.javadoc.skip=true -Dmaven.scaladoc.skip=true -Dmaven.source.skip"
if [[ "$ENABLE_WEBUI" == "true" ]]; then
MVN_DIST_OPT="$MVN_DIST_OPT -Pweb-ui"
@@ -335,7 +335,7 @@ if [[ -f "$KYUUBI_HOME/tools/spark-block-cleaner/target/spark-block-cleaner_${SC
fi
# Copy Kyuubi Spark extension
-SPARK_EXTENSION_VERSIONS=('3-1' '3-2' '3-3')
+SPARK_EXTENSION_VERSIONS=('3-1' '3-2' '3-3' '3-4')
# shellcheck disable=SC2068
for SPARK_EXTENSION_VERSION in ${SPARK_EXTENSION_VERSIONS[@]}; do
if [[ -f $"$KYUUBI_HOME/extensions/spark/kyuubi-extension-spark-$SPARK_EXTENSION_VERSION/target/kyuubi-extension-spark-${SPARK_EXTENSION_VERSION}_${SCALA_VERSION}-${VERSION}.jar" ]]; then
@@ -384,7 +384,11 @@ if [[ "$MAKE_TGZ" == "true" ]]; then
TARDIR="$KYUUBI_HOME/$TARDIR_NAME"
rm -rf "$TARDIR"
cp -R "$DISTDIR" "$TARDIR"
- tar czf "$TARDIR_NAME.tgz" -C "$KYUUBI_HOME" "$TARDIR_NAME"
+ TAR="tar"
+ if [ "$(uname -s)" = "Darwin" ]; then
+ TAR="tar --no-mac-metadata --no-xattrs --no-fflags"
+ fi
+ $TAR -czf "$TARDIR_NAME.tgz" -C "$KYUUBI_HOME" "$TARDIR_NAME"
rm -rf "$TARDIR"
echo "The Kyuubi tarball $TARDIR_NAME.tgz is successfully generated in $KYUUBI_HOME."
fi
diff --git a/build/kyuubi-build-info.cmd b/build/kyuubi-build-info.cmd
index 7717b48e4d0..d9e8e6c6a94 100755
--- a/build/kyuubi-build-info.cmd
+++ b/build/kyuubi-build-info.cmd
@@ -36,6 +36,7 @@ echo kyuubi_trino_version=%~9
echo user=%username%
FOR /F %%i IN ('git rev-parse HEAD') DO SET "revision=%%i"
+FOR /F "delims=" %%i IN ('git show -s --format^=%%ci HEAD') DO SET "revision_time=%%i"
FOR /F %%i IN ('git rev-parse --abbrev-ref HEAD') DO SET "branch=%%i"
FOR /F %%i IN ('git config --get remote.origin.url') DO SET "url=%%i"
@@ -44,6 +45,7 @@ FOR /f %%i IN ("%TIME%") DO SET current_time=%%i
set date=%current_date%_%current_time%
echo revision=%revision%
+echo revision_time=%revision_time%
echo branch=%branch%
echo date=%date%
echo url=%url%
diff --git a/build/mvn b/build/mvn
index 67aa02b4f79..cd6c0c796d1 100755
--- a/build/mvn
+++ b/build/mvn
@@ -35,7 +35,7 @@ fi
## Arg2 - Tarball Name
## Arg3 - Checkable Binary
install_app() {
- local remote_tarball="$1/$2"
+ local remote_tarball="$1/$2$4"
local local_tarball="${_DIR}/$2"
local binary="${_DIR}/$3"
@@ -77,12 +77,25 @@ install_mvn() {
# See simple version normalization: http://stackoverflow.com/questions/16989598/bash-comparing-version-numbers
function version { echo "$@" | awk -F. '{ printf("%03d%03d%03d\n", $1,$2,$3); }'; }
if [ $(version $MVN_DETECTED_VERSION) -ne $(version $MVN_VERSION) ]; then
- local APACHE_MIRROR=${APACHE_MIRROR:-'https://archive.apache.org/dist/'}
+ local APACHE_MIRROR=${APACHE_MIRROR:-'https://www.apache.org/dyn/closer.lua'}
+ local MIRROR_URL_QUERY="?action=download"
+ local MVN_TARBALL="apache-maven-${MVN_VERSION}-bin.tar.gz"
+ local FILE_PATH="maven/maven-3/${MVN_VERSION}/binaries"
+
+ if [ $(command -v curl) ]; then
+ if ! curl -L --output /dev/null --silent --head --fail "${APACHE_MIRROR}/${FILE_PATH}/${MVN_TARBALL}${MIRROR_URL_QUERY}" ; then
+ # Fall back to archive.apache.org for older Maven
+ echo "Falling back to archive.apache.org to download Maven"
+ APACHE_MIRROR="https://archive.apache.org/dist"
+ MIRROR_URL_QUERY=""
+ fi
+ fi
install_app \
- "${APACHE_MIRROR}/maven/maven-3/${MVN_VERSION}/binaries" \
- "apache-maven-${MVN_VERSION}-bin.tar.gz" \
- "apache-maven-${MVN_VERSION}/bin/mvn"
+ "${APACHE_MIRROR}/${FILE_PATH}" \
+ "${MVN_TARBALL}" \
+ "apache-maven-${MVN_VERSION}/bin/mvn" \
+ "${MIRROR_URL_QUERY}"
MVN_BIN="${_DIR}/apache-maven-${MVN_VERSION}/bin/mvn"
fi
diff --git a/build/mvnd b/build/mvnd
deleted file mode 100755
index 81a6f5c20a5..00000000000
--- a/build/mvnd
+++ /dev/null
@@ -1,139 +0,0 @@
-#!/usr/bin/env bash
-
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# Determine the current working directory
-_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-# Preserve the calling directory
-_CALLING_DIR="$(pwd)"
-# Options used during compilation
-_COMPILE_JVM_OPTS="-Xms2g -Xmx2g -XX:ReservedCodeCacheSize=1g -Xss128m"
-
-if [ "$CI" ]; then
- export MAVEN_CLI_OPTS="-Dmvnd.minThreads=4 --no-transfer-progress --errors --fail-fast -Dstyle.color=always"
-fi
-
-# Installs any application tarball given a URL, the expected tarball name,
-# and, optionally, a checkable binary path to determine if the binary has
-# already been installed
-## Arg1 - URL
-## Arg2 - Tarball Name
-## Arg3 - Checkable Binary
-install_app() {
- local remote_tarball="$1/$2"
- local local_tarball="${_DIR}/$2"
- local binary="${_DIR}/$3"
-
- # setup `curl` and `wget` silent options if we're running on Jenkins
- local curl_opts="-L"
- local wget_opts=""
- curl_opts="--progress-bar ${curl_opts}"
- wget_opts="--progress=bar:force ${wget_opts}"
-
- if [ -z "$3" ] || [ ! -f "$binary" ]; then
- # check if we already have the tarball
- # check if we have curl installed
- # download application
- rm -f "$local_tarball"
- [ ! -f "${local_tarball}" ] && [ "$(command -v curl)" ] && \
- echo "exec: curl ${curl_opts} ${remote_tarball}" 1>&2 && \
- curl ${curl_opts} "${remote_tarball}" > "${local_tarball}"
- # if the file still doesn't exist, lets try `wget` and cross our fingers
- [ ! -f "${local_tarball}" ] && [ "$(command -v wget)" ] && \
- echo "exec: wget ${wget_opts} ${remote_tarball}" 1>&2 && \
- wget ${wget_opts} -O "${local_tarball}" "${remote_tarball}"
- # if both were unsuccessful, exit
- [ ! -f "${local_tarball}" ] && \
- echo -n "ERROR: Cannot download $2 with cURL or wget; " && \
- echo "please install manually and try again." && \
- exit 2
- cd "${_DIR}" && tar -xzf "$2"
- rm -rf "$local_tarball"
- fi
-}
-
-function get_os_type() {
- local unameOsOut=$(uname -s)
- local osType
- case "${unameOsOut}" in
- Linux*) osType=linux ;;
- Darwin*) osType=darwin ;;
- CYGWIN*) osType=windows ;;
- MINGW*) osType=windows ;;
- *) osType="UNKNOWN:${unameOsOut}" ;;
- esac
- echo "$osType"
-}
-
-function get_os_arch() {
- local unameArchOut="$(uname -m)"
- local arch
- case "${unameArchOut}" in
- x86_64*) arch=amd64 ;;
- arm64*) arch=aarch64 ;;
- *) arch="UNKNOWN:${unameOsOut}" ;;
- esac
- echo "$arch"
-}
-
-# Determine the Mvnd version from the root pom.xml file and
-# install mvnd under the build/ folder if needed.
-function install_mvnd() {
- local MVND_VERSION=$(grep "" "${_DIR}/../pom.xml" | head -n1 | awk -F '[<>]' '{print $3}')
- local MVN_VERSION=$(grep "" "${_DIR}/../pom.xml" | head -n1 | awk -F '[<>]' '{print $3}')
- MVND_BIN="$(command -v mvnd)"
- if [ "$MVND_BIN" ]; then
- local MVND_DETECTED_VERSION="$(mvnd -v 2>&1 | grep '(mvnd)' | awk '{print $5}')"
- local MVN_DETECTED_VERSION="$(mvnd -v 2>&1 | grep 'Apache Maven' | awk 'NR==2 {print $3}')"
- fi
- # See simple version normalization: http://stackoverflow.com/questions/16989598/bash-comparing-version-numbers
- function version { echo "$@" | awk -F. '{ printf("%03d%03d%03d\n", $1,$2,$3); }'; }
-
- if [ $(version $MVND_DETECTED_VERSION) -ne $(version $MVND_VERSION) ]; then
- local APACHE_MIRROR=${APACHE_MIRROR:-'https://downloads.apache.org'}
- local OS_TYPE=$(get_os_type)
- local ARCH=$(get_os_arch)
-
- install_app \
- "${APACHE_MIRROR}/maven/mvnd/${MVND_VERSION}" \
- "maven-mvnd-${MVND_VERSION}-${OS_TYPE}-${ARCH}.tar.gz" \
- "maven-mvnd-${MVND_VERSION}-${OS_TYPE}-${ARCH}/bin/mvnd"
-
- MVND_BIN="${_DIR}/maven-mvnd-${MVND_VERSION}-${OS_TYPE}-${ARCH}/bin/mvnd"
- else
- if [ "$(version $MVN_DETECTED_VERSION)" -ne "$(version $MVN_VERSION)" ]; then
- echo "Mvnd $MVND_DETECTED_VERSION embedded maven version $MVN_DETECTED_VERSION is not equivalent to $MVN_VERSION required in pom."
- exit 1
- fi
- fi
-}
-
-install_mvnd
-
-cd "${_CALLING_DIR}"
-
-# Set any `mvn` options if not already present
-export MAVEN_OPTS=${MAVEN_OPTS:-"$_COMPILE_JVM_OPTS"}
-
-echo "Using \`mvnd\` from path: $MVND_BIN" 1>&2
-
-if [ "$MAVEN_CLI_OPTS" != "" ]; then
- echo "MAVEN_CLI_OPTS=$MAVEN_CLI_OPTS"
-fi
-
-${MVND_BIN} $MAVEN_CLI_OPTS "$@"
diff --git a/build/release/release.sh b/build/release/release.sh
index fefcce6a913..89ecd5230b9 100755
--- a/build/release/release.sh
+++ b/build/release/release.sh
@@ -52,6 +52,21 @@ if [[ ${RELEASE_VERSION} =~ .*-SNAPSHOT ]]; then
exit 1
fi
+if [ -n "${JAVA_HOME}" ]; then
+ JAVA="${JAVA_HOME}/bin/java"
+elif [ "$(command -v java)" ]; then
+ JAVA="java"
+else
+ echo "JAVA_HOME is not set" >&2
+ exit 1
+fi
+
+JAVA_VERSION=$($JAVA -version 2>&1 | awk -F '"' '/version/ {print $2}')
+if [[ $JAVA_VERSION != 1.8.* ]]; then
+ echo "Unexpected Java version: $JAVA_VERSION. Java 8 is required for release."
+ exit 1
+fi
+
RELEASE_TAG="v${RELEASE_VERSION}-rc${RELEASE_RC_NO}"
SVN_STAGING_REPO="https://dist.apache.org/repos/dist/dev/kyuubi"
@@ -101,6 +116,9 @@ upload_nexus_staging() {
-s "${KYUUBI_DIR}/build/release/asf-settings.xml" \
-pl extensions/spark/kyuubi-extension-spark-3-2 -am
${KYUUBI_DIR}/build/mvn clean deploy -DskipTests -Papache-release,flink-provided,spark-provided,hive-provided,spark-3.3 \
+ -s "${KYUUBI_DIR}/build/release/asf-settings.xml" \
+ -pl extensions/spark/kyuubi-extension-spark-3-3 -am
+ ${KYUUBI_DIR}/build/mvn clean deploy -DskipTests -Papache-release,flink-provided,spark-provided,hive-provided,spark-3.4 \
-s "${KYUUBI_DIR}/build/release/asf-settings.xml"
}
diff --git a/charts/kyuubi/Chart.yaml b/charts/kyuubi/Chart.yaml
index 0abec9e5ef3..7c881cc9ee0 100644
--- a/charts/kyuubi/Chart.yaml
+++ b/charts/kyuubi/Chart.yaml
@@ -20,7 +20,7 @@ name: kyuubi
description: A Helm chart for Kyuubi server
type: application
version: 0.1.0
-appVersion: 1.7.0
+appVersion: 1.7.2
home: https://kyuubi.apache.org
icon: https://raw.githubusercontent.com/apache/kyuubi/master/docs/imgs/logo.png
sources:
diff --git a/charts/kyuubi/README.md b/charts/kyuubi/README.md
index ef54c322605..dfec578dd7b 100644
--- a/charts/kyuubi/README.md
+++ b/charts/kyuubi/README.md
@@ -19,7 +19,7 @@
# Helm Chart for Apache Kyuubi
-[Apache Kyuubi](https://airflow.apache.org/) is a distributed and multi-tenant gateway to provide serverless SQL on Data Warehouses and Lakehouses.
+[Apache Kyuubi](https://kyuubi.apache.org) is a distributed and multi-tenant gateway to provide serverless SQL on Data Warehouses and Lakehouses.
## Introduction
@@ -32,11 +32,25 @@ cluster using the [Helm](https://helm.sh) package manager.
- Kubernetes cluster
- Helm 3.0+
+## Template rendering
+
+When you want to test the template rendering, but not actually install anything. [Debugging templates](https://helm.sh/docs/chart_template_guide/debugging/) provide a quick way of viewing the generated content without YAML parse errors blocking.
+
+There are two ways to render templates. It will return the rendered template to you so you can see the output.
+
+- Local rendering chart templates
+```shell
+helm template --debug ../kyuubi
+```
+- Server side rendering chart templates
+```shell
+helm install --dry-run --debug --generate-name ../kyuubi
+```
## Documentation
-Configuration guide documentation for Kyuubi lives [on the website](https://kyuubi.readthedocs.io/en/master/deployment/settings.html#kyuubi-configurations). (Not just for Helm Chart)
+Configuration guide documentation for Kyuubi lives [on the website](https://kyuubi.readthedocs.io/en/master/configuration/settings.html#kyuubi-configurations). (Not just for Helm Chart)
## Contributing
diff --git a/charts/kyuubi/templates/_helpers.tpl b/charts/kyuubi/templates/_helpers.tpl
index cd4865a1288..502bf4646c1 100644
--- a/charts/kyuubi/templates/_helpers.tpl
+++ b/charts/kyuubi/templates/_helpers.tpl
@@ -17,17 +17,35 @@
{{/*
A comma separated string of enabled frontend protocols, e.g. "REST,THRIFT_BINARY".
-For details, see 'kyuubi.frontend.protocols': https://kyuubi.readthedocs.io/en/master/deployment/settings.html#frontend
+For details, see 'kyuubi.frontend.protocols': https://kyuubi.readthedocs.io/en/master/configuration/settings.html#frontend
*/}}
{{- define "kyuubi.frontend.protocols" -}}
-{{- $protocols := list }}
-{{- range $name, $frontend := .Values.server }}
- {{- if $frontend.enabled }}
- {{- $protocols = $name | snakecase | upper | append $protocols }}
+ {{- $protocols := list }}
+ {{- range $name, $frontend := .Values.server }}
+ {{- if $frontend.enabled }}
+ {{- $protocols = $name | snakecase | upper | append $protocols }}
+ {{- end }}
{{- end }}
+ {{- if not $protocols }}
+ {{ fail "At least one frontend protocol must be enabled!" }}
+ {{- end }}
+ {{- $protocols | join "," }}
{{- end }}
-{{- if not $protocols }}
- {{ fail "At least one frontend protocol must be enabled!" }}
-{{- end }}
-{{- $protocols | join "," }}
-{{- end }}
+
+{{/*
+Selector labels
+*/}}
+{{- define "kyuubi.selectorLabels" -}}
+app.kubernetes.io/name: {{ .Chart.Name }}
+app.kubernetes.io/instance: {{ .Release.Name }}
+{{- end -}}
+
+{{/*
+Common labels
+*/}}
+{{- define "kyuubi.labels" -}}
+helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
+{{ include "kyuubi.selectorLabels" . }}
+app.kubernetes.io/version: {{ .Values.image.tag | default .Chart.AppVersion | quote }}
+app.kubernetes.io/managed-by: {{ .Release.Service }}
+{{- end -}}
diff --git a/charts/kyuubi/templates/kyuubi-alert.yaml b/charts/kyuubi/templates/kyuubi-alert.yaml
new file mode 100644
index 00000000000..8637e9e0395
--- /dev/null
+++ b/charts/kyuubi/templates/kyuubi-alert.yaml
@@ -0,0 +1,28 @@
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
+
+{{- if and .Values.server.prometheus.enabled (eq .Values.metricsReporters "PROMETHEUS") .Values.prometheusRule.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: {{ .Release.Name }}
+ labels:
+ {{- include "kyuubi.labels" . | nindent 4 }}
+spec:
+ groups:
+ {{- toYaml .Values.prometheusRule.groups | nindent 4 }}
+{{- end }}
diff --git a/charts/kyuubi/templates/kyuubi-configmap.yaml b/charts/kyuubi/templates/kyuubi-configmap.yaml
index 4964e651cdb..1e5e195d399 100644
--- a/charts/kyuubi/templates/kyuubi-configmap.yaml
+++ b/charts/kyuubi/templates/kyuubi-configmap.yaml
@@ -1,30 +1,26 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Release.Name }}
labels:
- helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
- app.kubernetes.io/name: {{ .Chart.Name }}
- app.kubernetes.io/instance: {{ .Release.Name }}
- app.kubernetes.io/version: {{ .Values.image.tag | default .Chart.AppVersion | quote }}
- app.kubernetes.io/managed-by: {{ .Release.Service }}
+ {{- include "kyuubi.labels" . | nindent 4 }}
data:
{{- with .Values.kyuubiConf.kyuubiEnv }}
kyuubi-env.sh: |
@@ -41,9 +37,13 @@ data:
kyuubi.frontend.mysql.bind.port={{ .Values.server.mysql.port }}
kyuubi.frontend.protocols={{ include "kyuubi.frontend.protocols" . }}
+ # Kyuubi Metrics
+ kyuubi.metrics.enabled={{ .Values.server.prometheus.enabled }}
+ kyuubi.metrics.reporters={{ .Values.metricsReporters }}
+
## User provided Kyuubi configurations
{{- with .Values.kyuubiConf.kyuubiDefaults }}
- {{- tpl . $ | nindent 4 }}
+ {{- tpl . $ | nindent 4 }}
{{- end }}
{{- with .Values.kyuubiConf.log4j2 }}
log4j2.xml: |
diff --git a/charts/kyuubi/templates/kyuubi-headless-service.yaml b/charts/kyuubi/templates/kyuubi-headless-service.yaml
new file mode 100644
index 00000000000..895859bac2c
--- /dev/null
+++ b/charts/kyuubi/templates/kyuubi-headless-service.yaml
@@ -0,0 +1,35 @@
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
+
+apiVersion: v1
+kind: Service
+metadata:
+ name: {{ .Release.Name }}-headless
+ labels:
+ {{- include "kyuubi.labels" $ | nindent 4 }}
+spec:
+ type: ClusterIP
+ clusterIP: None
+ ports:
+ {{- range $name, $frontend := .Values.server }}
+ - name: {{ $name | kebabcase }}
+ port: {{ tpl $frontend.service.port $ }}
+ targetPort: {{ $frontend.port }}
+ {{- end }}
+ selector:
+ {{- include "kyuubi.selectorLabels" $ | nindent 4 }}
+
diff --git a/charts/kyuubi/templates/kyuubi-podmonitor.yaml b/charts/kyuubi/templates/kyuubi-podmonitor.yaml
new file mode 100644
index 00000000000..ea0f762141a
--- /dev/null
+++ b/charts/kyuubi/templates/kyuubi-podmonitor.yaml
@@ -0,0 +1,31 @@
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
+
+{{- if and .Values.server.prometheus.enabled (eq .Values.metricsReporters "PROMETHEUS") .Values.podMonitor.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: PodMonitor
+metadata:
+ name: {{ .Release.Name }}
+ labels:
+ {{- include "kyuubi.labels" . | nindent 4 }}
+spec:
+ selector:
+ matchLabels:
+ app: {{ .Release.Name }}
+ podMetricsEndpoints:
+ {{- toYaml .Values.podMonitor.podMetricsEndpoint | nindent 4 }}
+{{- end }}
diff --git a/charts/kyuubi/templates/kyuubi-priorityclass.yaml b/charts/kyuubi/templates/kyuubi-priorityclass.yaml
new file mode 100644
index 00000000000..c756108aeeb
--- /dev/null
+++ b/charts/kyuubi/templates/kyuubi-priorityclass.yaml
@@ -0,0 +1,26 @@
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
+
+{{- if .Values.priorityClass.create }}
+apiVersion: scheduling.k8s.io/v1
+kind: PriorityClass
+metadata:
+ name: {{ .Values.priorityClass.name | default .Release.Name }}
+ labels:
+ {{- include "kyuubi.labels" . | nindent 4 }}
+value: {{ .Values.priorityClass.value }}
+{{- end }}
diff --git a/charts/kyuubi/templates/kyuubi-role.yaml b/charts/kyuubi/templates/kyuubi-role.yaml
index fcb5a9f6e4f..5ee8c1dff5a 100644
--- a/charts/kyuubi/templates/kyuubi-role.yaml
+++ b/charts/kyuubi/templates/kyuubi-role.yaml
@@ -1,19 +1,19 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
{{- if .Values.rbac.create }}
apiVersion: rbac.authorization.k8s.io/v1
@@ -21,10 +21,6 @@ kind: Role
metadata:
name: {{ .Release.Name }}
labels:
- helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
- app.kubernetes.io/name: {{ .Chart.Name }}
- app.kubernetes.io/instance: {{ .Release.Name }}
- app.kubernetes.io/version: {{ .Values.image.tag | default .Chart.AppVersion | quote }}
- app.kubernetes.io/managed-by: {{ .Release.Service }}
+ {{- include "kyuubi.labels" . | nindent 4 }}
rules: {{- toYaml .Values.rbac.rules | nindent 2 }}
{{- end }}
diff --git a/charts/kyuubi/templates/kyuubi-rolebinding.yaml b/charts/kyuubi/templates/kyuubi-rolebinding.yaml
index 8f74efc2dba..0f9dbd049c0 100644
--- a/charts/kyuubi/templates/kyuubi-rolebinding.yaml
+++ b/charts/kyuubi/templates/kyuubi-rolebinding.yaml
@@ -1,19 +1,19 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
{{- if .Values.rbac.create }}
apiVersion: rbac.authorization.k8s.io/v1
@@ -21,11 +21,7 @@ kind: RoleBinding
metadata:
name: {{ .Release.Name }}
labels:
- helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
- app.kubernetes.io/name: {{ .Chart.Name }}
- app.kubernetes.io/instance: {{ .Release.Name }}
- app.kubernetes.io/version: {{ .Values.image.tag | default .Chart.AppVersion | quote }}
- app.kubernetes.io/managed-by: {{ .Release.Service }}
+ {{- include "kyuubi.labels" . | nindent 4 }}
subjects:
- kind: ServiceAccount
name: {{ .Values.serviceAccount.name | default .Release.Name }}
diff --git a/charts/kyuubi/templates/kyuubi-service.yaml b/charts/kyuubi/templates/kyuubi-service.yaml
index 963f1fcc709..64c8b06ac20 100644
--- a/charts/kyuubi/templates/kyuubi-service.yaml
+++ b/charts/kyuubi/templates/kyuubi-service.yaml
@@ -1,19 +1,19 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
{{- range $name, $frontend := .Values.server }}
{{- if $frontend.enabled }}
@@ -22,14 +22,9 @@ kind: Service
metadata:
name: {{ $.Release.Name }}-{{ $name | kebabcase }}
labels:
- helm.sh/chart: {{ $.Chart.Name }}-{{ $.Chart.Version }}
- app.kubernetes.io/name: {{ $.Chart.Name }}
- app.kubernetes.io/instance: {{ $.Release.Name }}
- app.kubernetes.io/version: {{ $.Values.image.tag | default $.Chart.AppVersion | quote }}
- app.kubernetes.io/managed-by: {{ $.Release.Service }}
+ {{- include "kyuubi.labels" $ | nindent 4 }}
{{- with $frontend.service.annotations }}
- annotations:
- {{- toYaml . | nindent 4 }}
+ annotations: {{- toYaml . | nindent 4 }}
{{- end }}
spec:
type: {{ $frontend.service.type }}
@@ -41,8 +36,7 @@ spec:
nodePort: {{ $frontend.service.nodePort }}
{{- end }}
selector:
- app.kubernetes.io/name: {{ $.Chart.Name }}
- app.kubernetes.io/instance: {{ $.Release.Name }}
+ {{- include "kyuubi.selectorLabels" $ | nindent 4 }}
---
{{- end }}
{{- end }}
diff --git a/charts/kyuubi/templates/kyuubi-serviceaccount.yaml b/charts/kyuubi/templates/kyuubi-serviceaccount.yaml
index 770d5013669..a8e282a1fba 100644
--- a/charts/kyuubi/templates/kyuubi-serviceaccount.yaml
+++ b/charts/kyuubi/templates/kyuubi-serviceaccount.yaml
@@ -1,19 +1,19 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
{{- if .Values.serviceAccount.create }}
apiVersion: v1
@@ -21,9 +21,5 @@ kind: ServiceAccount
metadata:
name: {{ .Values.serviceAccount.name | default .Release.Name }}
labels:
- helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
- app.kubernetes.io/name: {{ .Chart.Name }}
- app.kubernetes.io/instance: {{ .Release.Name }}
- app.kubernetes.io/version: {{ .Values.image.tag | default .Chart.AppVersion | quote }}
- app.kubernetes.io/managed-by: {{ .Release.Service }}
+ {{- include "kyuubi.labels" . | nindent 4 }}
{{- end }}
diff --git a/charts/kyuubi/templates/kyuubi-servicemonitor.yaml b/charts/kyuubi/templates/kyuubi-servicemonitor.yaml
new file mode 100644
index 00000000000..7d997fc1199
--- /dev/null
+++ b/charts/kyuubi/templates/kyuubi-servicemonitor.yaml
@@ -0,0 +1,31 @@
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
+
+{{- if and .Values.server.prometheus.enabled (eq .Values.metricsReporters "PROMETHEUS") .Values.serviceMonitor.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+ name: {{ .Release.Name }}
+ labels:
+ {{- include "kyuubi.labels" . | nindent 4 }}
+spec:
+ selector:
+ matchLabels:
+ app: {{ .Release.Name }}
+ endpoints:
+ {{- toYaml .Values.serviceMonitor.endpoints | nindent 4 }}
+{{- end }}
diff --git a/charts/kyuubi/templates/kyuubi-deployment.yaml b/charts/kyuubi/templates/kyuubi-statefulset.yaml
similarity index 55%
rename from charts/kyuubi/templates/kyuubi-deployment.yaml
rename to charts/kyuubi/templates/kyuubi-statefulset.yaml
index 43899b6fc51..626796a78d6 100644
--- a/charts/kyuubi/templates/kyuubi-deployment.yaml
+++ b/charts/kyuubi/templates/kyuubi-statefulset.yaml
@@ -1,48 +1,54 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
+{{/*
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/}}
apiVersion: apps/v1
-kind: Deployment
+kind: StatefulSet
metadata:
name: {{ .Release.Name }}
labels:
- helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
- app.kubernetes.io/name: {{ .Chart.Name }}
- app.kubernetes.io/instance: {{ .Release.Name }}
- app.kubernetes.io/version: {{ .Values.image.tag | default .Chart.AppVersion | quote }}
- app.kubernetes.io/managed-by: {{ .Release.Service }}
+ {{- include "kyuubi.labels" . | nindent 4 }}
spec:
- replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
- app.kubernetes.io/name: {{ .Chart.Name }}
- app.kubernetes.io/instance: {{ .Release.Name }}
+ {{- include "kyuubi.selectorLabels" . | nindent 6 }}
+ serviceName: {{ .Release.Name }}-headless
+ minReadySeconds: {{ .Values.minReadySeconds }}
+ replicas: {{ .Values.replicaCount }}
+ revisionHistoryLimit: {{ .Values.revisionHistoryLimit }}
+ podManagementPolicy: {{ .Values.podManagementPolicy }}
+ {{- with .Values.updateStrategy }}
+ updateStrategy: {{- toYaml . | nindent 4 }}
+ {{- end }}
template:
metadata:
labels:
- app.kubernetes.io/name: {{ .Chart.Name }}
- app.kubernetes.io/instance: {{ .Release.Name }}
+ {{- include "kyuubi.selectorLabels" . | nindent 8 }}
annotations:
checksum/conf: {{ include (print $.Template.BasePath "/kyuubi-configmap.yaml") . | sha256sum }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets: {{- toYaml . | nindent 8 }}
{{- end }}
+ {{- if or .Values.serviceAccount.name .Values.serviceAccount.create }}
serviceAccountName: {{ .Values.serviceAccount.name | default .Release.Name }}
+ {{- end }}
+ {{- if or .Values.priorityClass.name .Values.priorityClass.create }}
+ priorityClassName: {{ .Values.priorityClass.name | default .Release.Name }}
+ {{- end }}
{{- with .Values.initContainers }}
initContainers: {{- tpl (toYaml .) $ | nindent 8 }}
{{- end }}
@@ -69,28 +75,28 @@ spec:
containerPort: {{ $frontend.port }}
{{- end }}
{{- end }}
- {{- if .Values.probe.liveness.enabled }}
+ {{- if .Values.livenessProbe.enabled }}
livenessProbe:
exec:
command: ["/bin/bash", "-c", "bin/kyuubi status"]
- initialDelaySeconds: {{ .Values.probe.liveness.initialDelaySeconds }}
- periodSeconds: {{ .Values.probe.liveness.periodSeconds }}
- timeoutSeconds: {{ .Values.probe.liveness.timeoutSeconds }}
- failureThreshold: {{ .Values.probe.liveness.failureThreshold }}
- successThreshold: {{ .Values.probe.liveness.successThreshold }}
+ initialDelaySeconds: {{ .Values.livenessProbe.initialDelaySeconds }}
+ periodSeconds: {{ .Values.livenessProbe.periodSeconds }}
+ timeoutSeconds: {{ .Values.livenessProbe.timeoutSeconds }}
+ failureThreshold: {{ .Values.livenessProbe.failureThreshold }}
+ successThreshold: {{ .Values.livenessProbe.successThreshold }}
{{- end }}
- {{- if .Values.probe.readiness.enabled }}
+ {{- if .Values.readinessProbe.enabled }}
readinessProbe:
exec:
command: ["/bin/bash", "-c", "$KYUUBI_HOME/bin/kyuubi status"]
- initialDelaySeconds: {{ .Values.probe.readiness.initialDelaySeconds }}
- periodSeconds: {{ .Values.probe.readiness.periodSeconds }}
- timeoutSeconds: {{ .Values.probe.readiness.timeoutSeconds }}
- failureThreshold: {{ .Values.probe.readiness.failureThreshold }}
- successThreshold: {{ .Values.probe.readiness.successThreshold }}
+ initialDelaySeconds: {{ .Values.readinessProbe.initialDelaySeconds }}
+ periodSeconds: {{ .Values.readinessProbe.periodSeconds }}
+ timeoutSeconds: {{ .Values.readinessProbe.timeoutSeconds }}
+ failureThreshold: {{ .Values.readinessProbe.failureThreshold }}
+ successThreshold: {{ .Values.readinessProbe.successThreshold }}
{{- end }}
{{- with .Values.resources }}
- resources: {{- toYaml . | nindent 12 }}
+ resources: {{- toYaml . | nindent 12 }}
{{- end }}
volumeMounts:
- name: conf
diff --git a/charts/kyuubi/values.yaml b/charts/kyuubi/values.yaml
index 7eca7211393..cfc79fae5be 100644
--- a/charts/kyuubi/values.yaml
+++ b/charts/kyuubi/values.yaml
@@ -22,6 +22,26 @@
# Kyuubi server numbers
replicaCount: 2
+# controls how Kyuubi server pods are created during initial scale up,
+# when replacing pods on nodes, or when scaling down.
+# The default policy is `OrderedReady`, alternative policy is `Parallel`.
+podManagementPolicy: OrderedReady
+
+# Minimum number of seconds for which a newly created kyuubi server
+# should be ready without any of its container crashing for it to be considered available.
+minReadySeconds: 30
+
+# maximum number of revisions that will be maintained in the StatefulSet's revision history.
+revisionHistoryLimit: 10
+
+# indicates the StatefulSetUpdateStrategy that will be employed to update Kyuubi server Pods in the StatefulSet
+# when a revision is made to Template.
+updateStrategy:
+ type: RollingUpdate
+ rollingUpdate:
+ maxUnavailable: 1
+ partition: 0
+
image:
repository: apache/kyuubi
pullPolicy: IfNotPresent
@@ -29,34 +49,32 @@ image:
imagePullSecrets: []
-# ServiceAccount used for Kyuubi create/list/delete pod in kubernetes
+# ServiceAccount used for Kyuubi create/list/delete pod in Kubernetes
serviceAccount:
+ # Specifies whether a ServiceAccount should be created
create: true
+ # Specifies ServiceAccount name to be used (created if `create: true`)
+ name: ~
+
+# priorityClass used for Kyuubi server pod
+priorityClass:
+ # Specifies whether a priorityClass should be created
+ create: false
+ # Specifies priorityClass name to be used (created if `create: true`)
name: ~
+ # half of system-cluster-critical by default
+ value: 1000000000
+# Role-based access control
rbac:
+ # Specifies whether RBAC resources should be created
create: true
+ # RBAC rules
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "list", "delete"]
-probe:
- liveness:
- enabled: true
- initialDelaySeconds: 30
- periodSeconds: 10
- timeoutSeconds: 2
- failureThreshold: 10
- successThreshold: 1
- readiness:
- enabled: true
- initialDelaySeconds: 30
- periodSeconds: 10
- timeoutSeconds: 2
- failureThreshold: 10
- successThreshold: 1
-
server:
# Thrift Binary protocol (HiveServer2 compatible)
thriftBinary:
@@ -98,37 +116,43 @@ server:
nodePort: ~
annotations: {}
+ # Exposes metrics in Prometheus format
+ prometheus:
+ enabled: true
+ port: 10019
+ service:
+ type: ClusterIP
+ port: "{{ .Values.server.prometheus.port }}"
+ nodePort: ~
+ annotations: {}
+
+# $KYUUBI_CONF_DIR directory
kyuubiConfDir: /opt/kyuubi/conf
+# Kyuubi configurations files
kyuubiConf:
# The value (templated string) is used for kyuubi-env.sh file
- # Example:
- #
- # kyuubiEnv: |
- # export JAVA_HOME=/usr/jdk64/jdk1.8.0_152
- # export SPARK_HOME=/opt/spark
- # export FLINK_HOME=/opt/flink
- # export HIVE_HOME=/opt/hive
- #
- # See example at conf/kyuubi-env.sh.template and https://kyuubi.readthedocs.io/en/master/deployment/settings.html#environments for more details
+ # See example at conf/kyuubi-env.sh.template and https://kyuubi.readthedocs.io/en/master/configuration/settings.html#environments for more details
kyuubiEnv: ~
+ # kyuubiEnv: |
+ # export JAVA_HOME=/usr/jdk64/jdk1.8.0_152
+ # export SPARK_HOME=/opt/spark
+ # export FLINK_HOME=/opt/flink
+ # export HIVE_HOME=/opt/hive
# The value (templated string) is used for kyuubi-defaults.conf file
- # Example:
- #
- # kyuubiDefaults: |
- # kyuubi.authentication=NONE
- # kyuubi.frontend.bind.host=10.0.0.1
- # kyuubi.engine.type=SPARK_SQL
- # kyuubi.engine.share.level=USER
- # kyuubi.session.engine.initialize.timeout=PT3M
- # kyuubi.ha.addresses=zk1:2181,zk2:2181,zk3:2181
- # kyuubi.ha.namespace=kyuubi
- #
- # See https://kyuubi.readthedocs.io/en/master/deployment/settings.html#kyuubi-configurations for more details
+ # See https://kyuubi.readthedocs.io/en/master/configuration/settings.html#kyuubi-configurations for more details
kyuubiDefaults: ~
+ # kyuubiDefaults: |
+ # kyuubi.authentication=NONE
+ # kyuubi.frontend.bind.host=10.0.0.1
+ # kyuubi.engine.type=SPARK_SQL
+ # kyuubi.engine.share.level=USER
+ # kyuubi.session.engine.initialize.timeout=PT3M
+ # kyuubi.ha.addresses=zk1:2181,zk2:2181,zk3:2181
+ # kyuubi.ha.namespace=kyuubi
# The value (templated string) is used for log4j2.xml file
- # See example at conf/log4j2.xml.template https://kyuubi.readthedocs.io/en/master/deployment/settings.html#logging for more details
+ # See example at conf/log4j2.xml.template https://kyuubi.readthedocs.io/en/master/configuration/settings.html#logging for more details
log4j2: ~
# Command to launch Kyuubi server (templated)
@@ -138,6 +162,7 @@ args: ~
# Environment variables (templated)
env: []
+# Environment variables from ConfigMaps and Secrets (templated)
envFrom: []
# Additional volumes for Kyuubi pod (templated)
@@ -150,21 +175,67 @@ initContainers: []
# Additional containers for Kyuubi pod (templated)
containers: []
+# Resource requests and limits for Kyuubi pods
resources: {}
- # Used to specify resource, default unlimited.
- # If you do want to specify resources:
- # 1. remove the curly braces after 'resources:'
- # 2. uncomment the following lines
- # limits:
- # cpu: 4
- # memory: 10Gi
- # requests:
- # cpu: 2
- # memory: 4Gi
-
-# Constrain Kyuubi server pods to specific nodes
+# resources:
+# requests:
+# cpu: 2
+# memory: 4Gi
+# limits:
+# cpu: 4
+# memory: 10Gi
+
+# Liveness probe
+livenessProbe:
+ enabled: true
+ initialDelaySeconds: 30
+ periodSeconds: 10
+ timeoutSeconds: 2
+ failureThreshold: 10
+ successThreshold: 1
+
+# Readiness probe
+readinessProbe:
+ enabled: true
+ initialDelaySeconds: 30
+ periodSeconds: 10
+ timeoutSeconds: 2
+ failureThreshold: 10
+ successThreshold: 1
+
+# Constrain Kyuubi pods to nodes with specific node labels
nodeSelector: {}
+# Allow to schedule Kyuubi pods on nodes with matching taints
tolerations: []
+# Constrain Kyuubi pods to nodes by complex affinity/anti-affinity rules
affinity: {}
+# Kyuubi pods security context
securityContext: {}
+
+# Monitoring Kyuubi - Server Metrics
+# PROMETHEUS - PrometheusReporter which exposes metrics in Prometheus format
+metricsReporters: ~
+
+# Prometheus pod monitor
+podMonitor:
+ # If enabled, podMonitor for operator's pod will be created
+ enabled: false
+ # The podMetricsEndpoint contains metrics information such as port, interval, scheme, and possibly other relevant details.
+ # This information is used to configure the endpoint from which Prometheus can scrape and collect metrics for a specific Pod in Kubernetes.
+ podMetricsEndpoint: []
+
+# Prometheus service monitor
+serviceMonitor:
+ # If enabled, ServiceMonitor resources for Prometheus Operator are created
+ enabled: false
+ # The endpoints section in a ServiceMonitor specifies the metrics information for each target endpoint.
+ # This allows you to collect metrics from multiple Services across your Kubernetes cluster in a standardized and automated way.
+ endpoints: []
+
+# Rules for the Prometheus Operator
+prometheusRule:
+ # If enabled, a PrometheusRule resource for Prometheus Operator is created
+ enabled: false
+ # Contents of Prometheus rules file
+ groups: []
diff --git a/conf/kyuubi-defaults.conf.template b/conf/kyuubi-defaults.conf.template
index c93971d9150..eef36ad10c3 100644
--- a/conf/kyuubi-defaults.conf.template
+++ b/conf/kyuubi-defaults.conf.template
@@ -33,4 +33,4 @@
# kyuubi.ha.namespace kyuubi
#
-# Details in https://kyuubi.readthedocs.io/en/master/deployment/settings.html
+# Details in https://kyuubi.readthedocs.io/en/master/configuration/settings.html
diff --git a/conf/log4j2.xml.template b/conf/log4j2.xml.template
index 37fc8acf036..86f9459a11e 100644
--- a/conf/log4j2.xml.template
+++ b/conf/log4j2.xml.template
@@ -24,6 +24,10 @@
rest-audit.logrest-audit-%d{yyyy-MM-dd}-%i.log
+
+ k8s-audit.log
+ k8s-audit-%d{yyyy-MM-dd}-%i.log
+
@@ -39,6 +43,14 @@
+
+
+
+
+
+
+
@@ -58,5 +70,8 @@
+
+
+
diff --git a/dev/dependencyList b/dev/dependencyList
index ab7697d3516..0675f56f04a 100644
--- a/dev/dependencyList
+++ b/dev/dependencyList
@@ -22,37 +22,34 @@ annotations/4.1.1.4//annotations-4.1.1.4.jar
antlr-runtime/3.5.3//antlr-runtime-3.5.3.jar
antlr4-runtime/4.9.3//antlr4-runtime-4.9.3.jar
aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
-arrow-format/11.0.0//arrow-format-11.0.0.jar
-arrow-memory-core/11.0.0//arrow-memory-core-11.0.0.jar
-arrow-memory-netty/11.0.0//arrow-memory-netty-11.0.0.jar
-arrow-vector/11.0.0//arrow-vector-11.0.0.jar
+arrow-format/12.0.0//arrow-format-12.0.0.jar
+arrow-memory-core/12.0.0//arrow-memory-core-12.0.0.jar
+arrow-memory-netty/12.0.0//arrow-memory-netty-12.0.0.jar
+arrow-vector/12.0.0//arrow-vector-12.0.0.jar
classgraph/4.8.138//classgraph-4.8.138.jar
commons-codec/1.15//commons-codec-1.15.jar
commons-collections/3.2.2//commons-collections-3.2.2.jar
commons-lang/2.6//commons-lang-2.6.jar
-commons-lang3/3.12.0//commons-lang3-3.12.0.jar
+commons-lang3/3.13.0//commons-lang3-3.13.0.jar
commons-logging/1.1.3//commons-logging-1.1.3.jar
-curator-client/2.12.0//curator-client-2.12.0.jar
-curator-framework/2.12.0//curator-framework-2.12.0.jar
-curator-recipes/2.12.0//curator-recipes-2.12.0.jar
derby/10.14.2.0//derby-10.14.2.0.jar
error_prone_annotations/2.14.0//error_prone_annotations-2.14.0.jar
failsafe/2.4.4//failsafe-2.4.4.jar
failureaccess/1.0.1//failureaccess-1.0.1.jar
flatbuffers-java/1.12.0//flatbuffers-java-1.12.0.jar
fliptables/1.0.2//fliptables-1.0.2.jar
-grpc-api/1.48.0//grpc-api-1.48.0.jar
-grpc-context/1.48.0//grpc-context-1.48.0.jar
-grpc-core/1.48.0//grpc-core-1.48.0.jar
-grpc-grpclb/1.48.0//grpc-grpclb-1.48.0.jar
-grpc-netty/1.48.0//grpc-netty-1.48.0.jar
-grpc-protobuf-lite/1.48.0//grpc-protobuf-lite-1.48.0.jar
-grpc-protobuf/1.48.0//grpc-protobuf-1.48.0.jar
-grpc-stub/1.48.0//grpc-stub-1.48.0.jar
+grpc-api/1.53.0//grpc-api-1.53.0.jar
+grpc-context/1.53.0//grpc-context-1.53.0.jar
+grpc-core/1.53.0//grpc-core-1.53.0.jar
+grpc-grpclb/1.53.0//grpc-grpclb-1.53.0.jar
+grpc-netty/1.53.0//grpc-netty-1.53.0.jar
+grpc-protobuf-lite/1.53.0//grpc-protobuf-lite-1.53.0.jar
+grpc-protobuf/1.53.0//grpc-protobuf-1.53.0.jar
+grpc-stub/1.53.0//grpc-stub-1.53.0.jar
gson/2.9.0//gson-2.9.0.jar
-guava/31.1-jre//guava-31.1-jre.jar
-hadoop-client-api/3.3.4//hadoop-client-api-3.3.4.jar
-hadoop-client-runtime/3.3.4//hadoop-client-runtime-3.3.4.jar
+guava/32.0.1-jre//guava-32.0.1-jre.jar
+hadoop-client-api/3.3.6//hadoop-client-api-3.3.6.jar
+hadoop-client-runtime/3.3.6//hadoop-client-runtime-3.3.6.jar
hive-common/3.1.3//hive-common-3.1.3.jar
hive-metastore/3.1.3//hive-metastore-3.1.3.jar
hive-serde/3.1.3//hive-serde-3.1.3.jar
@@ -68,16 +65,16 @@ httpclient/4.5.14//httpclient-4.5.14.jar
httpcore/4.4.16//httpcore-4.4.16.jar
httpmime/4.5.14//httpmime-4.5.14.jar
j2objc-annotations/1.3//j2objc-annotations-1.3.jar
-jackson-annotations/2.14.2//jackson-annotations-2.14.2.jar
-jackson-core/2.14.2//jackson-core-2.14.2.jar
-jackson-databind/2.14.2//jackson-databind-2.14.2.jar
-jackson-dataformat-yaml/2.14.2//jackson-dataformat-yaml-2.14.2.jar
-jackson-datatype-jdk8/2.14.2//jackson-datatype-jdk8-2.14.2.jar
-jackson-datatype-jsr310/2.14.2//jackson-datatype-jsr310-2.14.2.jar
-jackson-jaxrs-base/2.14.2//jackson-jaxrs-base-2.14.2.jar
-jackson-jaxrs-json-provider/2.14.2//jackson-jaxrs-json-provider-2.14.2.jar
-jackson-module-jaxb-annotations/2.14.2//jackson-module-jaxb-annotations-2.14.2.jar
-jackson-module-scala_2.12/2.14.2//jackson-module-scala_2.12-2.14.2.jar
+jackson-annotations/2.15.0//jackson-annotations-2.15.0.jar
+jackson-core/2.15.0//jackson-core-2.15.0.jar
+jackson-databind/2.15.0//jackson-databind-2.15.0.jar
+jackson-dataformat-yaml/2.15.0//jackson-dataformat-yaml-2.15.0.jar
+jackson-datatype-jdk8/2.15.0//jackson-datatype-jdk8-2.15.0.jar
+jackson-datatype-jsr310/2.15.0//jackson-datatype-jsr310-2.15.0.jar
+jackson-jaxrs-base/2.15.0//jackson-jaxrs-base-2.15.0.jar
+jackson-jaxrs-json-provider/2.15.0//jackson-jaxrs-json-provider-2.15.0.jar
+jackson-module-jaxb-annotations/2.15.0//jackson-module-jaxb-annotations-2.15.0.jar
+jackson-module-scala_2.12/2.15.0//jackson-module-scala_2.12-2.15.0.jar
jakarta.annotation-api/1.3.5//jakarta.annotation-api-1.3.5.jar
jakarta.inject/2.6.1//jakarta.inject-2.6.1.jar
jakarta.servlet-api/4.0.4//jakarta.servlet-api-4.0.4.jar
@@ -86,51 +83,55 @@ jakarta.ws.rs-api/2.1.6//jakarta.ws.rs-api-2.1.6.jar
jakarta.xml.bind-api/2.3.2//jakarta.xml.bind-api-2.3.2.jar
javassist/3.25.0-GA//javassist-3.25.0-GA.jar
jcl-over-slf4j/1.7.36//jcl-over-slf4j-1.7.36.jar
-jersey-client/2.39//jersey-client-2.39.jar
-jersey-common/2.39//jersey-common-2.39.jar
-jersey-container-servlet-core/2.39//jersey-container-servlet-core-2.39.jar
-jersey-entity-filtering/2.39//jersey-entity-filtering-2.39.jar
-jersey-hk2/2.39//jersey-hk2-2.39.jar
-jersey-media-json-jackson/2.39//jersey-media-json-jackson-2.39.jar
-jersey-media-multipart/2.39//jersey-media-multipart-2.39.jar
-jersey-server/2.39//jersey-server-2.39.jar
+jersey-client/2.39.1//jersey-client-2.39.1.jar
+jersey-common/2.39.1//jersey-common-2.39.1.jar
+jersey-container-servlet-core/2.39.1//jersey-container-servlet-core-2.39.1.jar
+jersey-entity-filtering/2.39.1//jersey-entity-filtering-2.39.1.jar
+jersey-hk2/2.39.1//jersey-hk2-2.39.1.jar
+jersey-media-json-jackson/2.39.1//jersey-media-json-jackson-2.39.1.jar
+jersey-media-multipart/2.39.1//jersey-media-multipart-2.39.1.jar
+jersey-server/2.39.1//jersey-server-2.39.1.jar
jetcd-api/0.7.3//jetcd-api-0.7.3.jar
jetcd-common/0.7.3//jetcd-common-0.7.3.jar
jetcd-core/0.7.3//jetcd-core-0.7.3.jar
jetcd-grpc/0.7.3//jetcd-grpc-0.7.3.jar
-jetty-http/9.4.50.v20221201//jetty-http-9.4.50.v20221201.jar
-jetty-io/9.4.50.v20221201//jetty-io-9.4.50.v20221201.jar
-jetty-security/9.4.50.v20221201//jetty-security-9.4.50.v20221201.jar
-jetty-server/9.4.50.v20221201//jetty-server-9.4.50.v20221201.jar
-jetty-servlet/9.4.50.v20221201//jetty-servlet-9.4.50.v20221201.jar
-jetty-util-ajax/9.4.50.v20221201//jetty-util-ajax-9.4.50.v20221201.jar
-jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar
+jetty-client/9.4.52.v20230823//jetty-client-9.4.52.v20230823.jar
+jetty-http/9.4.52.v20230823//jetty-http-9.4.52.v20230823.jar
+jetty-io/9.4.52.v20230823//jetty-io-9.4.52.v20230823.jar
+jetty-proxy/9.4.52.v20230823//jetty-proxy-9.4.52.v20230823.jar
+jetty-security/9.4.52.v20230823//jetty-security-9.4.52.v20230823.jar
+jetty-server/9.4.52.v20230823//jetty-server-9.4.52.v20230823.jar
+jetty-servlet/9.4.52.v20230823//jetty-servlet-9.4.52.v20230823.jar
+jetty-util-ajax/9.4.52.v20230823//jetty-util-ajax-9.4.52.v20230823.jar
+jetty-util/9.4.52.v20230823//jetty-util-9.4.52.v20230823.jar
jline/0.9.94//jline-0.9.94.jar
jul-to-slf4j/1.7.36//jul-to-slf4j-1.7.36.jar
-kubernetes-client-api/6.4.1//kubernetes-client-api-6.4.1.jar
-kubernetes-client/6.4.1//kubernetes-client-6.4.1.jar
-kubernetes-httpclient-okhttp/6.4.1//kubernetes-httpclient-okhttp-6.4.1.jar
-kubernetes-model-admissionregistration/6.4.1//kubernetes-model-admissionregistration-6.4.1.jar
-kubernetes-model-apiextensions/6.4.1//kubernetes-model-apiextensions-6.4.1.jar
-kubernetes-model-apps/6.4.1//kubernetes-model-apps-6.4.1.jar
-kubernetes-model-autoscaling/6.4.1//kubernetes-model-autoscaling-6.4.1.jar
-kubernetes-model-batch/6.4.1//kubernetes-model-batch-6.4.1.jar
-kubernetes-model-certificates/6.4.1//kubernetes-model-certificates-6.4.1.jar
-kubernetes-model-common/6.4.1//kubernetes-model-common-6.4.1.jar
-kubernetes-model-coordination/6.4.1//kubernetes-model-coordination-6.4.1.jar
-kubernetes-model-core/6.4.1//kubernetes-model-core-6.4.1.jar
-kubernetes-model-discovery/6.4.1//kubernetes-model-discovery-6.4.1.jar
-kubernetes-model-events/6.4.1//kubernetes-model-events-6.4.1.jar
-kubernetes-model-extensions/6.4.1//kubernetes-model-extensions-6.4.1.jar
-kubernetes-model-flowcontrol/6.4.1//kubernetes-model-flowcontrol-6.4.1.jar
-kubernetes-model-gatewayapi/6.4.1//kubernetes-model-gatewayapi-6.4.1.jar
-kubernetes-model-metrics/6.4.1//kubernetes-model-metrics-6.4.1.jar
-kubernetes-model-networking/6.4.1//kubernetes-model-networking-6.4.1.jar
-kubernetes-model-node/6.4.1//kubernetes-model-node-6.4.1.jar
-kubernetes-model-policy/6.4.1//kubernetes-model-policy-6.4.1.jar
-kubernetes-model-rbac/6.4.1//kubernetes-model-rbac-6.4.1.jar
-kubernetes-model-scheduling/6.4.1//kubernetes-model-scheduling-6.4.1.jar
-kubernetes-model-storageclass/6.4.1//kubernetes-model-storageclass-6.4.1.jar
+kafka-clients/3.4.0//kafka-clients-3.4.0.jar
+kubernetes-client-api/6.8.1//kubernetes-client-api-6.8.1.jar
+kubernetes-client/6.8.1//kubernetes-client-6.8.1.jar
+kubernetes-httpclient-okhttp/6.8.1//kubernetes-httpclient-okhttp-6.8.1.jar
+kubernetes-model-admissionregistration/6.8.1//kubernetes-model-admissionregistration-6.8.1.jar
+kubernetes-model-apiextensions/6.8.1//kubernetes-model-apiextensions-6.8.1.jar
+kubernetes-model-apps/6.8.1//kubernetes-model-apps-6.8.1.jar
+kubernetes-model-autoscaling/6.8.1//kubernetes-model-autoscaling-6.8.1.jar
+kubernetes-model-batch/6.8.1//kubernetes-model-batch-6.8.1.jar
+kubernetes-model-certificates/6.8.1//kubernetes-model-certificates-6.8.1.jar
+kubernetes-model-common/6.8.1//kubernetes-model-common-6.8.1.jar
+kubernetes-model-coordination/6.8.1//kubernetes-model-coordination-6.8.1.jar
+kubernetes-model-core/6.8.1//kubernetes-model-core-6.8.1.jar
+kubernetes-model-discovery/6.8.1//kubernetes-model-discovery-6.8.1.jar
+kubernetes-model-events/6.8.1//kubernetes-model-events-6.8.1.jar
+kubernetes-model-extensions/6.8.1//kubernetes-model-extensions-6.8.1.jar
+kubernetes-model-flowcontrol/6.8.1//kubernetes-model-flowcontrol-6.8.1.jar
+kubernetes-model-gatewayapi/6.8.1//kubernetes-model-gatewayapi-6.8.1.jar
+kubernetes-model-metrics/6.8.1//kubernetes-model-metrics-6.8.1.jar
+kubernetes-model-networking/6.8.1//kubernetes-model-networking-6.8.1.jar
+kubernetes-model-node/6.8.1//kubernetes-model-node-6.8.1.jar
+kubernetes-model-policy/6.8.1//kubernetes-model-policy-6.8.1.jar
+kubernetes-model-rbac/6.8.1//kubernetes-model-rbac-6.8.1.jar
+kubernetes-model-resource/6.8.1//kubernetes-model-resource-6.8.1.jar
+kubernetes-model-scheduling/6.8.1//kubernetes-model-scheduling-6.8.1.jar
+kubernetes-model-storageclass/6.8.1//kubernetes-model-storageclass-6.8.1.jar
libfb303/0.9.3//libfb303-0.9.3.jar
libthrift/0.9.3//libthrift-0.9.3.jar
log4j-1.2-api/2.20.0//log4j-1.2-api-2.20.0.jar
@@ -138,28 +139,29 @@ log4j-api/2.20.0//log4j-api-2.20.0.jar
log4j-core/2.20.0//log4j-core-2.20.0.jar
log4j-slf4j-impl/2.20.0//log4j-slf4j-impl-2.20.0.jar
logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
+lz4-java/1.8.0//lz4-java-1.8.0.jar
metrics-core/4.2.8//metrics-core-4.2.8.jar
metrics-jmx/4.2.8//metrics-jmx-4.2.8.jar
metrics-json/4.2.8//metrics-json-4.2.8.jar
metrics-jvm/4.2.8//metrics-jvm-4.2.8.jar
mimepull/1.9.15//mimepull-1.9.15.jar
-netty-all/4.1.89.Final//netty-all-4.1.89.Final.jar
-netty-buffer/4.1.89.Final//netty-buffer-4.1.89.Final.jar
-netty-codec-dns/4.1.89.Final//netty-codec-dns-4.1.89.Final.jar
-netty-codec-http/4.1.89.Final//netty-codec-http-4.1.89.Final.jar
-netty-codec-http2/4.1.89.Final//netty-codec-http2-4.1.89.Final.jar
-netty-codec-socks/4.1.89.Final//netty-codec-socks-4.1.89.Final.jar
-netty-codec/4.1.89.Final//netty-codec-4.1.89.Final.jar
-netty-common/4.1.89.Final//netty-common-4.1.89.Final.jar
-netty-handler-proxy/4.1.89.Final//netty-handler-proxy-4.1.89.Final.jar
-netty-handler/4.1.89.Final//netty-handler-4.1.89.Final.jar
-netty-resolver-dns/4.1.89.Final//netty-resolver-dns-4.1.89.Final.jar
-netty-resolver/4.1.89.Final//netty-resolver-4.1.89.Final.jar
-netty-transport-classes-epoll/4.1.89.Final//netty-transport-classes-epoll-4.1.89.Final.jar
-netty-transport-native-epoll/4.1.89.Final/linux-aarch_64/netty-transport-native-epoll-4.1.89.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.89.Final/linux-x86_64/netty-transport-native-epoll-4.1.89.Final-linux-x86_64.jar
-netty-transport-native-unix-common/4.1.89.Final//netty-transport-native-unix-common-4.1.89.Final.jar
-netty-transport/4.1.89.Final//netty-transport-4.1.89.Final.jar
+netty-all/4.1.93.Final//netty-all-4.1.93.Final.jar
+netty-buffer/4.1.93.Final//netty-buffer-4.1.93.Final.jar
+netty-codec-dns/4.1.93.Final//netty-codec-dns-4.1.93.Final.jar
+netty-codec-http/4.1.93.Final//netty-codec-http-4.1.93.Final.jar
+netty-codec-http2/4.1.93.Final//netty-codec-http2-4.1.93.Final.jar
+netty-codec-socks/4.1.93.Final//netty-codec-socks-4.1.93.Final.jar
+netty-codec/4.1.93.Final//netty-codec-4.1.93.Final.jar
+netty-common/4.1.93.Final//netty-common-4.1.93.Final.jar
+netty-handler-proxy/4.1.93.Final//netty-handler-proxy-4.1.93.Final.jar
+netty-handler/4.1.93.Final//netty-handler-4.1.93.Final.jar
+netty-resolver-dns/4.1.93.Final//netty-resolver-dns-4.1.93.Final.jar
+netty-resolver/4.1.93.Final//netty-resolver-4.1.93.Final.jar
+netty-transport-classes-epoll/4.1.93.Final//netty-transport-classes-epoll-4.1.93.Final.jar
+netty-transport-native-epoll/4.1.93.Final/linux-aarch_64/netty-transport-native-epoll-4.1.93.Final-linux-aarch_64.jar
+netty-transport-native-epoll/4.1.93.Final/linux-x86_64/netty-transport-native-epoll-4.1.93.Final-linux-x86_64.jar
+netty-transport-native-unix-common/4.1.93.Final//netty-transport-native-unix-common-4.1.93.Final.jar
+netty-transport/4.1.93.Final//netty-transport-4.1.93.Final.jar
okhttp-urlconnection/3.14.9//okhttp-urlconnection-3.14.9.jar
okhttp/3.12.12//okhttp-3.12.12.jar
okio/1.15.0//okio-1.15.0.jar
@@ -169,7 +171,7 @@ perfmark-api/0.25.0//perfmark-api-0.25.0.jar
proto-google-common-protos/2.9.0//proto-google-common-protos-2.9.0.jar
protobuf-java-util/3.21.7//protobuf-java-util-3.21.7.jar
protobuf-java/3.21.7//protobuf-java-3.21.7.jar
-scala-library/2.12.17//scala-library-2.12.17.jar
+scala-library/2.12.18//scala-library-2.12.18.jar
scopt_2.12/4.1.0//scopt_2.12-4.1.0.jar
simpleclient/0.16.0//simpleclient-0.16.0.jar
simpleclient_common/0.16.0//simpleclient_common-0.16.0.jar
@@ -180,7 +182,10 @@ simpleclient_tracer_common/0.16.0//simpleclient_tracer_common-0.16.0.jar
simpleclient_tracer_otel/0.16.0//simpleclient_tracer_otel-0.16.0.jar
simpleclient_tracer_otel_agent/0.16.0//simpleclient_tracer_otel_agent-0.16.0.jar
slf4j-api/1.7.36//slf4j-api-1.7.36.jar
-snakeyaml/1.33//snakeyaml-1.33.jar
+snakeyaml-engine/2.6//snakeyaml-engine-2.6.jar
+snakeyaml/2.2//snakeyaml-2.2.jar
+snappy-java/1.1.8.4//snappy-java-1.1.8.4.jar
+sqlite-jdbc/3.42.0.0//sqlite-jdbc-3.42.0.0.jar
swagger-annotations/2.2.1//swagger-annotations-2.2.1.jar
swagger-core/2.2.1//swagger-core-2.2.1.jar
swagger-integration/2.2.1//swagger-integration-2.2.1.jar
@@ -192,4 +197,4 @@ units/1.6//units-1.6.jar
vertx-core/4.3.2//vertx-core-4.3.2.jar
vertx-grpc/4.3.2//vertx-grpc-4.3.2.jar
zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
-zookeeper/3.4.14//zookeeper-3.4.14.jar
+zstd-jni/1.5.2-1//zstd-jni-1.5.2-1.jar
diff --git a/dev/gen/gen_all_config_docs.sh b/dev/gen/gen_all_config_docs.sh
new file mode 100755
index 00000000000..2a5dca7f952
--- /dev/null
+++ b/dev/gen/gen_all_config_docs.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Golden result file:
+# docs/deployment/settings.md
+
+KYUUBI_UPDATE="${KYUUBI_UPDATE:-1}" \
+build/mvn clean test \
+ -pl kyuubi-server -am \
+ -Pflink-provided,spark-provided,hive-provided \
+ -Dtest=none \
+ -DwildcardSuites=org.apache.kyuubi.config.AllKyuubiConfiguration
diff --git a/dev/gen/gen_hive_kdf_docs.sh b/dev/gen/gen_hive_kdf_docs.sh
new file mode 100755
index 00000000000..b670dc3c531
--- /dev/null
+++ b/dev/gen/gen_hive_kdf_docs.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Golden result file:
+# docs/extensions/engines/hive/functions.md
+
+KYUUBI_UPDATE="${KYUUBI_UPDATE:-1}" \
+build/mvn clean test \
+ -pl externals/kyuubi-hive-sql-engine -am \
+ -Pflink-provided,spark-provided,hive-provided \
+ -DwildcardSuites=org.apache.kyuubi.engine.hive.udf.KyuubiDefinedFunctionSuite
diff --git a/dev/gen/gen_ranger_policy_json.sh b/dev/gen/gen_ranger_policy_json.sh
new file mode 100755
index 00000000000..1f4193d3e1f
--- /dev/null
+++ b/dev/gen/gen_ranger_policy_json.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Golden result file:
+# extensions/spark/kyuubi-spark-authz/src/test/resources/sparkSql_hive_jenkins.json
+
+KYUUBI_UPDATE="${KYUUBI_UPDATE:-1}" \
+build/mvn clean test \
+ -pl extensions/spark/kyuubi-spark-authz \
+ -Pgen-policy \
+ -Dtest=none \
+ -DwildcardSuites=org.apache.kyuubi.plugin.spark.authz.gen.PolicyJsonFileGenerator
diff --git a/dev/gen/gen_ranger_spec_json.sh b/dev/gen/gen_ranger_spec_json.sh
new file mode 100755
index 00000000000..e00857f8f23
--- /dev/null
+++ b/dev/gen/gen_ranger_spec_json.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Golden result file:
+# extensions/spark/kyuubi-spark-authz/src/main/resources/*_spec.json
+
+KYUUBI_UPDATE="${KYUUBI_UPDATE:-1}" \
+build/mvn clean test \
+ -pl extensions/spark/kyuubi-spark-authz \
+ -Pgen-policy \
+ -Dtest=none \
+ -DwildcardSuites=org.apache.kyuubi.plugin.spark.authz.gen.JsonSpecFileGenerator
diff --git a/dev/gen/gen_spark_kdf_docs.sh b/dev/gen/gen_spark_kdf_docs.sh
new file mode 100755
index 00000000000..ac13082e31e
--- /dev/null
+++ b/dev/gen/gen_spark_kdf_docs.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Golden result file:
+# docs/extensions/engines/spark/functions.md
+
+KYUUBI_UPDATE="${KYUUBI_UPDATE:-1}" \
+build/mvn clean test \
+ -pl externals/kyuubi-spark-sql-engine -am \
+ -Pflink-provided,spark-provided,hive-provided \
+ -DwildcardSuites=org.apache.kyuubi.engine.spark.udf.KyuubiDefinedFunctionSuite
diff --git a/dev/gen/gen_tpcds_output_schema.sh b/dev/gen/gen_tpcds_output_schema.sh
new file mode 100755
index 00000000000..49f8d77988a
--- /dev/null
+++ b/dev/gen/gen_tpcds_output_schema.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Golden result file:
+# extensions/spark/kyuubi-spark-authz/src/test/resources/*.output.schema
+
+KYUUBI_UPDATE="${KYUUBI_UPDATE:-1}" \
+build/mvn clean install \
+ -pl kyuubi-server -am \
+ -Dmaven.plugin.scalatest.exclude.tags="" \
+ -Dtest=none \
+ -DwildcardSuites=org.apache.kyuubi.operation.tpcds.OutputSchemaTPCDSSuite
diff --git a/dev/gen/gen_tpcds_queries.sh b/dev/gen/gen_tpcds_queries.sh
new file mode 100755
index 00000000000..07f075b7a88
--- /dev/null
+++ b/dev/gen/gen_tpcds_queries.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Golden result file:
+# kyuubi-spark-connector-tpcds/src/main/resources/kyuubi/tpcds_*/*.sql
+
+KYUUBI_UPDATE="${KYUUBI_UPDATE:-1}" \
+build/mvn clean install \
+ -pl extensions/spark/kyuubi-spark-connector-tpcds -am \
+ -Dmaven.plugin.scalatest.exclude.tags="" \
+ -Dtest=none \
+ -DwildcardSuites=org.apache.kyuubi.spark.connector.tpcds.TPCDSQuerySuite
diff --git a/dev/gen/gen_tpch_queries.sh b/dev/gen/gen_tpch_queries.sh
new file mode 100755
index 00000000000..d0c65256f01
--- /dev/null
+++ b/dev/gen/gen_tpch_queries.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Golden result file:
+# kyuubi-spark-connector-tpcds/src/main/resources/kyuubi/tpcdh_*/*.sql
+
+KYUUBI_UPDATE="${KYUUBI_UPDATE:-1}" \
+build/mvn clean install \
+ -pl extensions/spark/kyuubi-spark-connector-tpch -am \
+ -Dmaven.plugin.scalatest.exclude.tags="" \
+ -Dtest=none \
+ -DwildcardSuites=org.apache.kyuubi.spark.connector.tpch.TPCHQuerySuite
diff --git a/dev/kyuubi-codecov/pom.xml b/dev/kyuubi-codecov/pom.xml
index ba15ec0f823..31b9d27bc03 100644
--- a/dev/kyuubi-codecov/pom.xml
+++ b/dev/kyuubi-codecov/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../pom.xml
- kyuubi-codecov_2.12
+ kyuubi-codecov_${scala.binary.version}pomKyuubi Dev Code Coveragehttps://kyuubi.apache.org/
@@ -199,7 +199,17 @@
org.apache.kyuubi
- kyuubi-spark-connector-kudu_${scala.binary.version}
+ kyuubi-spark-connector-hive_${scala.binary.version}
+ ${project.version}
+
+
+
+
+ spark-3.4
+
+
+ org.apache.kyuubi
+ kyuubi-extension-spark-3-4_${scala.binary.version}${project.version}
diff --git a/dev/kyuubi-tpcds/pom.xml b/dev/kyuubi-tpcds/pom.xml
index 1bc69f9f2ce..b80c1227fc2 100644
--- a/dev/kyuubi-tpcds/pom.xml
+++ b/dev/kyuubi-tpcds/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../pom.xml
- kyuubi-tpcds_2.12
+ kyuubi-tpcds_${scala.binary.version}jarKyuubi Dev TPCDS Generatorhttps://kyuubi.apache.org/
diff --git a/dev/merge_kyuubi_pr.py b/dev/merge_kyuubi_pr.py
index cb3696d1f98..fe889374867 100755
--- a/dev/merge_kyuubi_pr.py
+++ b/dev/merge_kyuubi_pr.py
@@ -30,9 +30,9 @@
import re
import subprocess
import sys
-from urllib.request import urlopen
-from urllib.request import Request
from urllib.error import HTTPError
+from urllib.request import Request
+from urllib.request import urlopen
KYUUBI_HOME = os.environ.get("KYUUBI_HOME", os.getcwd())
PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME", "apache")
@@ -248,6 +248,8 @@ def main():
user_login = pr["user"]["login"]
base_ref = pr["head"]["ref"]
pr_repo_desc = "%s/%s" % (user_login, base_ref)
+ assignees = pr["assignees"]
+ milestone = pr["milestone"]
# Merged pull requests don't appear as merged in the GitHub API;
# Instead, they're closed by asfgit.
@@ -276,6 +278,17 @@ def main():
print("\n=== Pull Request #%s ===" % pr_num)
print("title:\t%s\nsource:\t%s\ntarget:\t%s\nurl:\t%s\nbody:\n\n%s" %
(title, pr_repo_desc, target_ref, url, body))
+
+ if assignees is None or len(assignees)==0:
+ continue_maybe("Assignees have NOT been set. Continue?")
+ else:
+ print("assignees: %s" % [assignee["login"] for assignee in assignees])
+
+ if milestone is None:
+ continue_maybe("Milestone has NOT been set. Continue?")
+ else:
+ print("milestone: %s" % milestone["title"])
+
continue_maybe("Proceed with merging pull request #%s?" % pr_num)
merged_refs = [target_ref]
diff --git a/dev/reformat b/dev/reformat
index 7c6ef712485..6346e68f68d 100755
--- a/dev/reformat
+++ b/dev/reformat
@@ -20,7 +20,7 @@ set -x
KYUUBI_HOME="$(cd "`dirname "$0"`/.."; pwd)"
-PROFILES="-Pflink-provided,hive-provided,spark-provided,spark-block-cleaner,spark-3.3,spark-3.2,spark-3.1,tpcds"
+PROFILES="-Pflink-provided,hive-provided,spark-provided,spark-block-cleaner,spark-3.4,spark-3.3,spark-3.2,spark-3.1,tpcds"
# python style checks rely on `black` in path
if ! command -v black &> /dev/null
diff --git a/docker/kyuubi-configmap.yaml b/docker/kyuubi-configmap.yaml
index 9b799359625..6a6d430ce58 100644
--- a/docker/kyuubi-configmap.yaml
+++ b/docker/kyuubi-configmap.yaml
@@ -52,4 +52,4 @@ data:
# kyuubi.frontend.bind.port 10009
#
- # Details in https://kyuubi.readthedocs.io/en/master/deployment/settings.html
+ # Details in https://kyuubi.readthedocs.io/en/master/configuration/settings.html
diff --git a/docker/playground/.env b/docker/playground/.env
index ea214551182..5c3d124a7d1 100644
--- a/docker/playground/.env
+++ b/docker/playground/.env
@@ -15,16 +15,16 @@
# limitations under the License.
#
-AWS_JAVA_SDK_VERSION=1.12.316
-HADOOP_VERSION=3.3.5
+AWS_JAVA_SDK_VERSION=1.12.367
+HADOOP_VERSION=3.3.6
HIVE_VERSION=2.3.9
-ICEBERG_VERSION=1.2.0
-KYUUBI_VERSION=1.7.0
-KYUUBI_HADOOP_VERSION=3.3.4
+ICEBERG_VERSION=1.3.1
+KYUUBI_VERSION=1.7.2
+KYUUBI_HADOOP_VERSION=3.3.5
POSTGRES_VERSION=12
POSTGRES_JDBC_VERSION=42.3.4
SCALA_BINARY_VERSION=2.12
-SPARK_VERSION=3.3.2
+SPARK_VERSION=3.3.3
SPARK_BINARY_VERSION=3.3
SPARK_HADOOP_VERSION=3.3.2
ZOOKEEPER_VERSION=3.6.3
diff --git a/docker/playground/compose.yml b/docker/playground/compose.yml
index b0d2b1ea89f..362b3505be1 100644
--- a/docker/playground/compose.yml
+++ b/docker/playground/compose.yml
@@ -21,7 +21,7 @@ services:
environment:
MINIO_ROOT_USER: minio
MINIO_ROOT_PASSWORD: minio_minio
- MINIO_DEFAULT_BUCKETS: spark-bucket,iceberg-bucket
+ MINIO_DEFAULT_BUCKETS: spark-bucket
container_name: minio
hostname: minio
ports:
diff --git a/docker/playground/conf/kyuubi-defaults.conf b/docker/playground/conf/kyuubi-defaults.conf
index 15b3fbf6e4b..e4a674634d4 100644
--- a/docker/playground/conf/kyuubi-defaults.conf
+++ b/docker/playground/conf/kyuubi-defaults.conf
@@ -30,4 +30,4 @@ kyuubi.operation.progress.enabled=true
kyuubi.engine.session.initialize.sql \
show namespaces in tpcds; \
show namespaces in tpch; \
- show namespaces in postgres;
+ show namespaces in postgres
diff --git a/docker/playground/conf/spark-defaults.conf b/docker/playground/conf/spark-defaults.conf
index 9d1d4a6028b..7983b5e705c 100644
--- a/docker/playground/conf/spark-defaults.conf
+++ b/docker/playground/conf/spark-defaults.conf
@@ -38,7 +38,3 @@ spark.sql.catalog.postgres.url=jdbc:postgresql://postgres:5432/metastore
spark.sql.catalog.postgres.driver=org.postgresql.Driver
spark.sql.catalog.postgres.user=postgres
spark.sql.catalog.postgres.password=postgres
-
-spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
-spark.sql.catalog.iceberg.type=hadoop
-spark.sql.catalog.iceberg.warehouse=s3a://iceberg-bucket/iceberg-warehouse
diff --git a/docker/playground/image/kyuubi-playground-base.Dockerfile b/docker/playground/image/kyuubi-playground-base.Dockerfile
index 6ee4ed40519..e8375eb68b8 100644
--- a/docker/playground/image/kyuubi-playground-base.Dockerfile
+++ b/docker/playground/image/kyuubi-playground-base.Dockerfile
@@ -20,4 +20,4 @@ RUN set -x && \
mkdir /opt/busybox && \
busybox --install /opt/busybox
-ENV PATH=/opt/java/openjdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/busybox
+ENV PATH=${PATH}:/opt/busybox
diff --git a/docs/appendix/terminology.md b/docs/appendix/terminology.md
index b81fa25fe87..b349d77c7bd 100644
--- a/docs/appendix/terminology.md
+++ b/docs/appendix/terminology.md
@@ -129,9 +129,9 @@ As an enterprise service, SLA commitment is essential. Deploying Kyuubi in High
-## DataLake & LakeHouse
+## DataLake & Lakehouse
-Kyuubi unifies DataLake & LakeHouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.
+Kyuubi unifies DataLake & Lakehouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.
### Apache Iceberg
diff --git a/docs/client/advanced/kerberos.md b/docs/client/advanced/kerberos.md
index 4962dd2c8b2..a9cb5581227 100644
--- a/docs/client/advanced/kerberos.md
+++ b/docs/client/advanced/kerberos.md
@@ -242,5 +242,5 @@ jdbc:hive2://:/;kyuubiServerPrinc
- `principal` is inherited from Hive JDBC Driver and is a little ambiguous, and we could use `kyuubiServerPrincipal` as its alias.
- `kyuubi_server_principal` is the value of `kyuubi.kinit.principal` set in `kyuubi-defaults.conf`.
- As a command line argument, JDBC URL should be quoted to avoid being split into 2 commands by ";".
-- As to DBeaver, `;principal=` should be set as the `Database/Schema` argument.
+- As to DBeaver, `;principal=` or `;kyuubiServerPrincipal=` should be set as the `Database/Schema` argument.
diff --git a/docs/client/jdbc/hive_jdbc.md b/docs/client/jdbc/hive_jdbc.md
index 42d2f7b5a33..00498dfaa01 100644
--- a/docs/client/jdbc/hive_jdbc.md
+++ b/docs/client/jdbc/hive_jdbc.md
@@ -19,14 +19,18 @@
## Instructions
-Kyuubi does not provide its own JDBC Driver so far,
-as it is fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query,
-analyze and visualize data though Spark SQL engines.
+Kyuubi is fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI)
+tools to query, analyze and visualize data though Spark SQL engines.
+
+It's recommended to use [Kyuubi JDBC driver](./kyuubi_jdbc.html) for new applications.
## Install Hive JDBC
For programing, the easiest way to get `hive-jdbc` is from [the maven central](https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc). For example,
+The following sections demonstrate how to use Hive JDBC driver 2.3.8 to connect Kyuubi Server, actually, any version
+less or equals 3.1.x should work fine.
+
- **maven**
```xml
@@ -76,7 +80,3 @@ jdbc:hive2://:/;?#<[spark|hive]Var
jdbc:hive2://localhost:10009/default;hive.server2.proxy.user=proxy_user?kyuubi.engine.share.level=CONNECTION;spark.ui.enabled=false#var_x=y
```
-## Unsupported Hive Features
-
-- Connect to HiveServer2 using HTTP transport. ```transportMode=http```
-
diff --git a/docs/client/jdbc/kyuubi_jdbc.rst b/docs/client/jdbc/kyuubi_jdbc.rst
index fdc40d599eb..d4270ea8ac6 100644
--- a/docs/client/jdbc/kyuubi_jdbc.rst
+++ b/docs/client/jdbc/kyuubi_jdbc.rst
@@ -17,14 +17,14 @@ Kyuubi Hive JDBC Driver
=======================
.. versionadded:: 1.4.0
- Since 1.4.0, kyuubi community maintains a forked hive jdbc driver module and provides both shaded and non-shaded packages.
+ Kyuubi community maintains a forked Hive JDBC driver module and provides both shaded and non-shaded packages.
-This packages aims to support some missing functionalities of the original hive jdbc.
-For kyuubi engines that support multiple catalogs, it provides meta APIs for better support.
-The behaviors of the original hive jdbc have remained.
+This packages aims to support some missing functionalities of the original Hive JDBC driver.
+For Kyuubi engines that support multiple catalogs, it provides meta APIs for better support.
+The behaviors of the original Hive JDBC driver have remained.
-To access a Hive data warehouse or new lakehouse formats, such as Apache Iceberg/Hudi, delta lake using the kyuubi jdbc driver for Apache kyuubi, you need to configure
-the following:
+To access a Hive data warehouse or new Lakehouse formats, such as Apache Iceberg/Hudi, Delta Lake using the Kyuubi JDBC driver
+for Apache kyuubi, you need to configure the following:
- The list of driver library files - :ref:`referencing-libraries`.
- The Driver or DataSource class - :ref:`registering_class`.
@@ -46,28 +46,28 @@ In the code, specify the artifact `kyuubi-hive-jdbc-shaded` from `Maven Central`
Maven
^^^^^
-.. code-block:: xml
+.. parsed-literal::
org.apache.kyuubikyuubi-hive-jdbc-shaded
- 1.5.2-incubating
+ \ |release|\
-Sbt
+sbt
^^^
-.. code-block:: sbt
+.. parsed-literal::
- libraryDependencies += "org.apache.kyuubi" % "kyuubi-hive-jdbc-shaded" % "1.5.2-incubating"
+ libraryDependencies += "org.apache.kyuubi" % "kyuubi-hive-jdbc-shaded" % "\ |release|\"
Gradle
^^^^^^
-.. code-block:: gradle
+.. parsed-literal::
- implementation group: 'org.apache.kyuubi', name: 'kyuubi-hive-jdbc-shaded', version: '1.5.2-incubating'
+ implementation group: 'org.apache.kyuubi', name: 'kyuubi-hive-jdbc-shaded', version: '\ |release|\'
Using the Driver in a JDBC Application
**************************************
@@ -92,11 +92,9 @@ connection for JDBC:
.. code-block:: java
- private static Connection connectViaDM() throws Exception
- {
- Connection connection = null;
- connection = DriverManager.getConnection(CONNECTION_URL);
- return connection;
+ private static Connection newKyuubiConnection() throws Exception {
+ Connection connection = DriverManager.getConnection(CONNECTION_URL);
+ return connection;
}
.. _building_url:
@@ -112,12 +110,13 @@ accessing. The following is the format of the connection URL for the Kyuubi Hive
.. code-block:: jdbc
- jdbc:subprotocol://host:port/schema;<[#|?]sessionProperties>
+ jdbc:subprotocol://host:port[/catalog]/[schema];<[#|?]sessionProperties>
- subprotocol: kyuubi or hive2
- host: DNS or IP address of the kyuubi server
- port: The number of the TCP port that the server uses to listen for client requests
-- dbName: Optional database name to set the current database to run the query against, use `default` if absent.
+- catalog: Optional catalog name to set the current catalog to run the query against.
+- schema: Optional database name to set the current database to run the query against, use `default` if absent.
- clientProperties: Optional `semicolon(;)` separated `key=value` parameters identified and affect the client behavior locally. e.g., user=foo;password=bar.
- sessionProperties: Optional `semicolon(;)` separated `key=value` parameters used to configure the session, operation or background engines.
For instance, `kyuubi.engine.share.level=CONNECTION` determines the background engine instance is used only by the current connection. `spark.ui.enabled=false` disables the Spark UI of the engine.
@@ -127,7 +126,7 @@ accessing. The following is the format of the connection URL for the Kyuubi Hive
- Properties are case-sensitive
- Do not duplicate properties in the connection URL
-Connection URL over Http
+Connection URL over HTTP
************************
.. versionadded:: 1.6.0
@@ -145,16 +144,78 @@ Connection URL over Service Discovery
jdbc:subprotocol:///;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi
-- zookeeper quorum is the corresponding zookeeper cluster configured by `kyuubi.ha.zookeeper.quorum` at the server side.
-- zooKeeperNamespace is the corresponding namespace configured by `kyuubi.ha.zookeeper.namespace` at the server side.
+- zookeeper quorum is the corresponding zookeeper cluster configured by `kyuubi.ha.addresses` at the server side.
+- zooKeeperNamespace is the corresponding namespace configured by `kyuubi.ha.namespace` at the server side.
-Authentication
---------------
+Kerberos Authentication
+-----------------------
+Since 1.6.0, Kyuubi JDBC driver implements the Kerberos authentication based on JAAS framework instead of `Hadoop UserGroupInformation`_,
+which means it does not forcibly rely on Hadoop dependencies to connect a kerberized Kyuubi Server.
+Kyuubi JDBC driver supports different approaches to connect a kerberized Kyuubi Server. First of all, please follow
+the `krb5.conf instruction`_ to setup ``krb5.conf`` properly.
-DataTypes
----------
+Authentication by Principal and Keytab
+**************************************
+
+.. versionadded:: 1.6.0
+
+.. tip::
+
+ It's the simplest way w/ minimal setup requirements for Kerberos authentication.
+
+It's straightforward to use principal and keytab for Kerberos authentication, just simply configure them in the JDBC URL.
+
+.. code-block::
+
+ jdbc:kyuubi://host:port/schema;kyuubiClientPrincipal=;kyuubiClientKeytab=;kyuubiServerPrincipal=
+
+- kyuubiClientPrincipal: Kerberos ``principal`` for client authentication
+- kyuubiClientKeytab: path of Kerberos ``keytab`` file for client authentication
+- kyuubiServerPrincipal: Kerberos ``principal`` configured by `kyuubi.kinit.principal` at the server side. ``kyuubiServerPrincipal`` is available
+ as an alias of ``principal`` since 1.7.0, use ``principal`` for previous versions.
+
+Authentication by Principal and TGT Cache
+*****************************************
+
+Another typical usage of Kerberos authentication is using `kinit` to generate the TGT cache first, then the application
+does Kerberos authentication through the TGT cache.
+
+.. code-block::
+
+ jdbc:kyuubi://host:port/schema;kyuubiServerPrincipal=
+
+Authentication by `Hadoop UserGroupInformation`_ ``doAs`` (programing only)
+***************************************************************************
+
+.. tip::
+
+ This approach allows project which already uses `Hadoop UserGroupInformation`_ for Kerberos authentication to easily
+ connect the kerberized Kyuubi Server. This approach does not work between [1.6.0, 1.7.0], and got fixed in 1.7.1.
+
+.. code-block::
+
+ String jdbcUrl = "jdbc:kyuubi://host:port/schema;kyuubiServerPrincipal="
+ UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytab(clientPrincipal, clientKeytab);
+ ugi.doAs((PrivilegedExceptionAction) () -> {
+ Connection conn = DriverManager.getConnection(jdbcUrl);
+ ...
+ });
+
+Authentication by Subject (programing only)
+*******************************************
+
+.. code-block:: java
+
+ String jdbcUrl = "jdbc:kyuubi://host:port/schema;kyuubiServerPrincipal=;kerberosAuthType=fromSubject"
+ Subject kerberizedSubject = ...;
+ Subject.doAs(kerberizedSubject, (PrivilegedExceptionAction) () -> {
+ Connection conn = DriverManager.getConnection(jdbcUrl);
+ ...
+ });
.. _Maven Central: https://mvnrepository.com/artifact/org.apache.kyuubi/kyuubi-hive-jdbc-shaded
.. _JDBC Applications: ../bi_tools/index.html
.. _java.sql.DriverManager: https://docs.oracle.com/javase/8/docs/api/java/sql/DriverManager.html
+.. _Hadoop UserGroupInformation: https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/security/UserGroupInformation.html
+.. _krb5.conf instruction: https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/KerberosReq.html
\ No newline at end of file
diff --git a/docs/client/python/index.rst b/docs/client/python/index.rst
index 70d2bc9e3db..5e8ae4228ac 100644
--- a/docs/client/python/index.rst
+++ b/docs/client/python/index.rst
@@ -22,4 +22,4 @@ Python
pyhive
pyspark
-
+ jaydebeapi
diff --git a/docs/client/python/jaydebeapi.md b/docs/client/python/jaydebeapi.md
new file mode 100644
index 00000000000..3d89fd72298
--- /dev/null
+++ b/docs/client/python/jaydebeapi.md
@@ -0,0 +1,87 @@
+
+
+# Python-JayDeBeApi
+
+The [JayDeBeApi](https://pypi.org/project/JayDeBeApi/) module allows you to connect from Python code to databases using Java JDBC.
+It provides a Python DB-API v2.0 to that database.
+
+## Requirements
+
+To install Python-JayDeBeApi, you can use pip, the Python package manager. Open your command-line interface or terminal and run the following command:
+
+```shell
+pip install jaydebeapi
+```
+
+If you want to install JayDeBeApi in Jython, you'll need to ensure that you have either pip or EasyInstall available for Jython. These tools are used to install Python packages, including JayDeBeApi.
+Or you can get a copy of the source by cloning from the [JayDeBeApi GitHub project](https://github.com/baztian/jaydebeapi) and install it.
+
+```shell
+python setup.py install
+```
+
+or if you are using Jython use
+
+```shell
+jython setup.py install
+```
+
+## Preparation
+
+Using the Python-JayDeBeApi package to connect to Kyuubi, you need to install the library and configure the relevant JDBC driver. You can download JDBC driver from maven repository and specify its path in Python. Choose the matching driver `kyuubi-hive-jdbc-*.jar` package based on the Kyuubi server version.
+The driver class name is `org.apache.kyuubi.jdbc.KyuubiHiveDriver`.
+
+| Package | Repo |
+|--------------------|-----------------------------------------------------------------------------------------------------|
+| kyuubi jdbc driver | [kyuubi-hive-jdbc-*.jar](https://repo1.maven.org/maven2/org/apache/kyuubi/kyuubi-hive-jdbc-shaded/) |
+
+## Usage
+
+Below is a simple example demonstrating how to use Python-JayDeBeApi to connect to Kyuubi database and execute a query:
+
+```python
+import jaydebeapi
+
+# Set JDBC driver path and connection URL
+driver = "org.apache.kyuubi.jdbc.KyuubiHiveDriver"
+url = "jdbc:kyuubi://host:port/default"
+jdbc_driver_path = ["/path/to/kyuubi-hive-jdbc-*.jar"]
+
+# Connect to the database using JayDeBeApi
+conn = jaydebeapi.connect(driver, url, ["user", "password"], jdbc_driver_path)
+
+# Create a cursor object
+cursor = conn.cursor()
+
+# Execute the SQL query
+cursor.execute("SELECT * FROM example_table LIMIT 10")
+
+# Retrieve query results
+result_set = cursor.fetchall()
+
+# Process the results
+for row in result_set:
+ print(row)
+
+# Close the cursor and the connection
+cursor.close()
+conn.close()
+```
+
+Make sure to replace the placeholders (host, port, user, password) with your actual Kyuubi configuration.
+With the above code, you can connect to Kyuubi and execute SQL queries in Python. Please handle exceptions and errors appropriately in real-world applications.
diff --git a/docs/client/python/pyhive.md b/docs/client/python/pyhive.md
index dbebf684fc0..b5e57ea2eae 100644
--- a/docs/client/python/pyhive.md
+++ b/docs/client/python/pyhive.md
@@ -64,7 +64,47 @@ If password is provided for connection, make sure the `auth` param set to either
```python
# open connection
-conn = hive.Connection(host=kyuubi_host,port=10009,
-user='user', password='password', auth='CUSTOM')
+conn = hive.Connection(host=kyuubi_host, port=10009,
+ username='user', password='password', auth='CUSTOM')
+```
+
+Use Kerberos to connect to Kyuubi.
+
+`kerberos_service_name` must be the name of the service that started the Kyuubi server, usually the prefix of the first slash of `kyuubi.kinit.principal`.
+
+Note that PyHive does not support passing in `principal`, it splices in part of `principal` with `kerberos_service_name` and `kyuubi_host`.
+
+```python
+# open connection
+conn = hive.Connection(host=kyuubi_host, port=10009, auth="KERBEROS", kerberos_service_name="kyuubi")
+```
+
+If you encounter the following errors, you need to install related packages.
+
+```
+thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found'
+```
+
+```bash
+yum install -y cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-md5
+```
+
+Note that PyHive does not support the connection method based on zookeeper HA, you can connect to zookeeper to get the service address via [Kazoo](https://pypi.org/project/kazoo/).
+
+Code reference [https://stackoverflow.com/a/73326589](https://stackoverflow.com/a/73326589)
+
+```python
+from pyhive import hive
+import random
+from kazoo.client import KazooClient
+zk = KazooClient(hosts='kyuubi1.xx.com:2181,kyuubi2.xx.com:2181,kyuubi3.xx.com:2181', read_only=True)
+zk.start()
+servers = [kyuubi_server.split(';')[0].split('=')[1].split(':')
+ for kyuubi_server
+ in zk.get_children(path='kyuubi')]
+kyuubi_host, kyuubi_port = random.choice(servers)
+zk.stop()
+print(kyuubi_host, kyuubi_port)
+conn = hive.Connection(host=kyuubi_host, port=kyuubi_port, auth="KERBEROS", kerberos_service_name="kyuubi")
```
diff --git a/docs/client/rest/rest_api.md b/docs/client/rest/rest_api.md
index fbff59f0500..fc04857d020 100644
--- a/docs/client/rest/rest_api.md
+++ b/docs/client/rest/rest_api.md
@@ -449,7 +449,13 @@ Refresh the Hadoop configurations of the Kyuubi server.
### POST /admin/refresh/user_defaults_conf
-Refresh the [user defaults configs](../../deployment/settings.html#user-defaults) with key in format in the form of `___{username}___.{config key}` from default property file.
+Refresh the [user defaults configs](../../configuration/settings.html#user-defaults) with key in format in the form of `___{username}___.{config key}` from default property file.
+
+### POST /admin/refresh/kubernetes_conf
+
+Refresh the kubernetes configs with key prefixed with `kyuubi.kubernetes` from default property file.
+
+It is helpful if you need to support multiple kubernetes contexts and namespaces, see [KYUUBI #4843](https://github.com/apache/kyuubi/issues/4843).
### DELETE /admin/engine
diff --git a/docs/community/release.md b/docs/community/release.md
index 8252669c0dc..f2c8541b1e1 100644
--- a/docs/community/release.md
+++ b/docs/community/release.md
@@ -191,6 +191,7 @@ The tag pattern is `v${RELEASE_VERSION}-rc${RELEASE_RC_NO}`, e.g. `v1.7.0-rc0`
```shell
# Bump to the release version
build/mvn versions:set -DgenerateBackupPoms=false -DnewVersion="${RELEASE_VERSION}"
+(cd kyuubi-server/web-ui && npm version "${RELEASE_VERSION}")
git commit -am "[RELEASE] Bump ${RELEASE_VERSION}"
# Create tag
@@ -198,6 +199,7 @@ git tag v${RELEASE_VERSION}-rc${RELEASE_RC_NO}
# Prepare for the next development version
build/mvn versions:set -DgenerateBackupPoms=false -DnewVersion="${NEXT_VERSION}-SNAPSHOT"
+(cd kyuubi-server/web-ui && npm version "${NEXT_VERSION}-SNAPSHOT")
git commit -am "[RELEASE] Bump ${NEXT_VERSION}-SNAPSHOT"
# Push branch to apache remote repo
@@ -299,6 +301,9 @@ svn delete https://dist.apache.org/repos/dist/dev/kyuubi/{RELEASE_TAG} \
--message "Remove deprecated Apache Kyuubi ${RELEASE_TAG}"
```
-## Publish docker image
+## Keep other artifacts up-to-date
+
+- Docker Image: https://github.com/apache/kyuubi-docker/blob/master/release/release_guide.md
+- Helm Charts: https://github.com/apache/kyuubi/blob/master/charts/kyuubi/Chart.yaml
+- Playground: https://github.com/apache/kyuubi/blob/master/docker/playground/.env
-See steps in `https://github.com/apache/kyuubi-docker/blob/master/release/release_guide.md`
diff --git a/docs/conf.py b/docs/conf.py
index dcf038314c5..eaac1acedef 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -77,9 +77,11 @@
'sphinx.ext.napoleon',
'sphinx.ext.mathjax',
'recommonmark',
+ 'sphinx_copybutton',
'sphinx_markdown_tables',
'sphinx_togglebutton',
'notfound.extension',
+ 'sphinxemoji.sphinxemoji',
]
master_doc = 'index'
diff --git a/docs/deployment/settings.md b/docs/configuration/settings.md
similarity index 86%
rename from docs/deployment/settings.md
rename to docs/configuration/settings.md
index 960f2c328e8..832099764c2 100644
--- a/docs/deployment/settings.md
+++ b/docs/configuration/settings.md
@@ -16,7 +16,7 @@
-->
-# Introduction to the Kyuubi Configurations System
+# Configurations
Kyuubi provides several ways to configure the system and corresponding engines.
@@ -33,7 +33,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| Key | Default | Meaning | Type | Since |
|-----------------------------------------------|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------|
-| kyuubi.authentication | NONE | A comma-separated list of client authentication types.
The following tree describes the catalog of each option.
NOSASL
SASL
SASL/PLAIN
NONE
LDAP
JDBC
CUSTOM
SASL/GSSAPI
KERBEROS
Note that: for SASL authentication, KERBEROS and PLAIN auth types are supported at the same time, and only the first specified PLAIN auth type is valid. | seq | 1.0.0 |
+| kyuubi.authentication | NONE | A comma-separated list of client authentication types.
The following tree describes the catalog of each option.
NOSASL
SASL
SASL/PLAIN
NONE
LDAP
JDBC
CUSTOM
SASL/GSSAPI
KERBEROS
Note that: for SASL authentication, KERBEROS and PLAIN auth types are supported at the same time, and only the first specified PLAIN auth type is valid. | set | 1.0.0 |
| kyuubi.authentication.custom.class | <undefined> | User-defined authentication implementation of org.apache.kyuubi.service.authentication.PasswdAuthenticationProvider | string | 1.3.0 |
| kyuubi.authentication.jdbc.driver.class | <undefined> | Driver class name for JDBC Authentication Provider. | string | 1.6.0 |
| kyuubi.authentication.jdbc.password | <undefined> | Database password for JDBC Authentication Provider. | string | 1.6.0 |
@@ -47,29 +47,31 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.authentication.ldap.domain | <undefined> | LDAP domain. | string | 1.0.0 |
| kyuubi.authentication.ldap.groupClassKey | groupOfNames | LDAP attribute name on the group entry that is to be used in LDAP group searches. For example: group, groupOfNames or groupOfUniqueNames. | string | 1.7.0 |
| kyuubi.authentication.ldap.groupDNPattern | <undefined> | COLON-separated list of patterns to use to find DNs for group entities in this directory. Use %s where the actual group name is to be substituted for. For example: CN=%s,CN=Groups,DC=subdomain,DC=domain,DC=com. | string | 1.7.0 |
-| kyuubi.authentication.ldap.groupFilter || COMMA-separated list of LDAP Group names (short name not full DNs). For example: HiveAdmins,HadoopAdmins,Administrators | seq | 1.7.0 |
+| kyuubi.authentication.ldap.groupFilter || COMMA-separated list of LDAP Group names (short name not full DNs). For example: HiveAdmins,HadoopAdmins,Administrators | set | 1.7.0 |
| kyuubi.authentication.ldap.groupMembershipKey | member | LDAP attribute name on the group object that contains the list of distinguished names for the user, group, and contact objects that are members of the group. For example: member, uniqueMember or memberUid | string | 1.7.0 |
| kyuubi.authentication.ldap.guidKey | uid | LDAP attribute name whose values are unique in this LDAP server. For example: uid or CN. | string | 1.2.0 |
| kyuubi.authentication.ldap.url | <undefined> | SPACE character separated LDAP connection URL(s). | string | 1.0.0 |
| kyuubi.authentication.ldap.userDNPattern | <undefined> | COLON-separated list of patterns to use to find DNs for users in this directory. Use %s where the actual group name is to be substituted for. For example: CN=%s,CN=Users,DC=subdomain,DC=domain,DC=com. | string | 1.7.0 |
-| kyuubi.authentication.ldap.userFilter || COMMA-separated list of LDAP usernames (just short names, not full DNs). For example: hiveuser,impalauser,hiveadmin,hadoopadmin | seq | 1.7.0 |
+| kyuubi.authentication.ldap.userFilter || COMMA-separated list of LDAP usernames (just short names, not full DNs). For example: hiveuser,impalauser,hiveadmin,hadoopadmin | set | 1.7.0 |
| kyuubi.authentication.ldap.userMembershipKey | <undefined> | LDAP attribute name on the user object that contains groups of which the user is a direct member, except for the primary group, which is represented by the primaryGroupId. For example: memberOf | string | 1.7.0 |
| kyuubi.authentication.sasl.qop | auth | Sasl QOP enable higher levels of protection for Kyuubi communication with clients.
auth - authentication only (default)
auth-int - authentication plus integrity protection
auth-conf - authentication plus integrity and confidentiality protection. This is applicable only if Kyuubi is configured to use Kerberos authentication.
| string | 1.0.0 |
### Backend
-| Key | Default | Meaning | Type | Since |
-|--------------------------------------------------|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
-| kyuubi.backend.engine.exec.pool.keepalive.time | PT1M | Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in SQL engine applications | duration | 1.0.0 |
-| kyuubi.backend.engine.exec.pool.shutdown.timeout | PT10S | Timeout(ms) for the operation execution thread pool to terminate in SQL engine applications | duration | 1.0.0 |
-| kyuubi.backend.engine.exec.pool.size | 100 | Number of threads in the operation execution thread pool of SQL engine applications | int | 1.0.0 |
-| kyuubi.backend.engine.exec.pool.wait.queue.size | 100 | Size of the wait queue for the operation execution thread pool in SQL engine applications | int | 1.0.0 |
-| kyuubi.backend.server.event.json.log.path | file:///tmp/kyuubi/events | The location of server events go for the built-in JSON logger | string | 1.4.0 |
-| kyuubi.backend.server.event.loggers || A comma-separated list of server history loggers, where session/operation etc events go.
JSON: the events will be written to the location of kyuubi.backend.server.event.json.log.path
JDBC: to be done
CUSTOM: User-defined event handlers.
Note that: Kyuubi supports custom event handlers with the Java SPI. To register a custom event handler, the user needs to implement a class which is a child of org.apache.kyuubi.events.handler.CustomEventHandlerProvider which has a zero-arg constructor. | seq | 1.4.0 |
-| kyuubi.backend.server.exec.pool.keepalive.time | PT1M | Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in Kyuubi server | duration | 1.0.0 |
-| kyuubi.backend.server.exec.pool.shutdown.timeout | PT10S | Timeout(ms) for the operation execution thread pool to terminate in Kyuubi server | duration | 1.0.0 |
-| kyuubi.backend.server.exec.pool.size | 100 | Number of threads in the operation execution thread pool of Kyuubi server | int | 1.0.0 |
-| kyuubi.backend.server.exec.pool.wait.queue.size | 100 | Size of the wait queue for the operation execution thread pool of Kyuubi server | int | 1.0.0 |
+| Key | Default | Meaning | Type | Since |
+|--------------------------------------------------|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
+| kyuubi.backend.engine.exec.pool.keepalive.time | PT1M | Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in SQL engine applications | duration | 1.0.0 |
+| kyuubi.backend.engine.exec.pool.shutdown.timeout | PT10S | Timeout(ms) for the operation execution thread pool to terminate in SQL engine applications | duration | 1.0.0 |
+| kyuubi.backend.engine.exec.pool.size | 100 | Number of threads in the operation execution thread pool of SQL engine applications | int | 1.0.0 |
+| kyuubi.backend.engine.exec.pool.wait.queue.size | 100 | Size of the wait queue for the operation execution thread pool in SQL engine applications | int | 1.0.0 |
+| kyuubi.backend.server.event.json.log.path | file:///tmp/kyuubi/events | The location of server events go for the built-in JSON logger | string | 1.4.0 |
+| kyuubi.backend.server.event.kafka.close.timeout | PT5S | Period to wait for Kafka producer of server event handlers to close. | duration | 1.8.0 |
+| kyuubi.backend.server.event.kafka.topic | <undefined> | The topic of server events go for the built-in Kafka logger | string | 1.8.0 |
+| kyuubi.backend.server.event.loggers || A comma-separated list of server history loggers, where session/operation etc events go.
JSON: the events will be written to the location of kyuubi.backend.server.event.json.log.path
KAFKA: the events will be serialized in JSON format and sent to topic of `kyuubi.backend.server.event.kafka.topic`. Note: For the configs of Kafka producer, please specify them with the prefix: `kyuubi.backend.server.event.kafka.`. For example, `kyuubi.backend.server.event.kafka.bootstrap.servers=127.0.0.1:9092`
JDBC: to be done
CUSTOM: User-defined event handlers.
Note that: Kyuubi supports custom event handlers with the Java SPI. To register a custom event handler, the user needs to implement a class which is a child of org.apache.kyuubi.events.handler.CustomEventHandlerProvider which has a zero-arg constructor. | seq | 1.4.0 |
+| kyuubi.backend.server.exec.pool.keepalive.time | PT1M | Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in Kyuubi server | duration | 1.0.0 |
+| kyuubi.backend.server.exec.pool.shutdown.timeout | PT10S | Timeout(ms) for the operation execution thread pool to terminate in Kyuubi server | duration | 1.0.0 |
+| kyuubi.backend.server.exec.pool.size | 100 | Number of threads in the operation execution thread pool of Kyuubi server | int | 1.0.0 |
+| kyuubi.backend.server.exec.pool.wait.queue.size | 100 | Size of the wait queue for the operation execution thread pool of Kyuubi server | int | 1.0.0 |
### Batch
@@ -77,7 +79,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
|---------------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
| kyuubi.batch.application.check.interval | PT5S | The interval to check batch job application information. | duration | 1.6.0 |
| kyuubi.batch.application.starvation.timeout | PT3M | Threshold above which to warn batch application may be starved. | duration | 1.7.0 |
-| kyuubi.batch.conf.ignore.list || A comma-separated list of ignored keys for batch conf. If the batch conf contains any of them, the key and the corresponding value will be removed silently during batch job submission. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering. You can also pre-define some config for batch job submission with the prefix: kyuubi.batchConf.[batchType]. For example, you can pre-define `spark.master` for the Spark batch job with key `kyuubi.batchConf.spark.spark.master`. | seq | 1.6.0 |
+| kyuubi.batch.conf.ignore.list || A comma-separated list of ignored keys for batch conf. If the batch conf contains any of them, the key and the corresponding value will be removed silently during batch job submission. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering. You can also pre-define some config for batch job submission with the prefix: kyuubi.batchConf.[batchType]. For example, you can pre-define `spark.master` for the Spark batch job with key `kyuubi.batchConf.spark.spark.master`. | set | 1.6.0 |
| kyuubi.batch.session.idle.timeout | PT6H | Batch session idle timeout, it will be closed when it's not accessed for this duration | duration | 1.6.2 |
### Credentials
@@ -130,30 +132,35 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.engine.chat.memory | 1g | The heap memory for the Chat engine | string | 1.8.0 |
| kyuubi.engine.chat.provider | ECHO | The provider for the Chat engine. Candidates:
ECHO: simply replies a welcome message.
GPT: a.k.a ChatGPT, powered by OpenAI.
| string | 1.8.0 |
| kyuubi.engine.connection.url.use.hostname | true | (deprecated) When true, the engine registers with hostname to zookeeper. When Spark runs on K8s with cluster mode, set to false to ensure that server can connect to engine | boolean | 1.3.0 |
-| kyuubi.engine.deregister.exception.classes || A comma-separated list of exception classes. If there is any exception thrown, whose class matches the specified classes, the engine would deregister itself. | seq | 1.2.0 |
-| kyuubi.engine.deregister.exception.messages || A comma-separated list of exception messages. If there is any exception thrown, whose message or stacktrace matches the specified message list, the engine would deregister itself. | seq | 1.2.0 |
+| kyuubi.engine.deregister.exception.classes || A comma-separated list of exception classes. If there is any exception thrown, whose class matches the specified classes, the engine would deregister itself. | set | 1.2.0 |
+| kyuubi.engine.deregister.exception.messages || A comma-separated list of exception messages. If there is any exception thrown, whose message or stacktrace matches the specified message list, the engine would deregister itself. | set | 1.2.0 |
| kyuubi.engine.deregister.exception.ttl | PT30M | Time to live(TTL) for exceptions pattern specified in kyuubi.engine.deregister.exception.classes and kyuubi.engine.deregister.exception.messages to deregister engines. Once the total error count hits the kyuubi.engine.deregister.job.max.failures within the TTL, an engine will deregister itself and wait for self-terminated. Otherwise, we suppose that the engine has recovered from temporary failures. | duration | 1.2.0 |
| kyuubi.engine.deregister.job.max.failures | 4 | Number of failures of job before deregistering the engine. | int | 1.2.0 |
| kyuubi.engine.event.json.log.path | file:///tmp/kyuubi/events | The location where all the engine events go for the built-in JSON logger.
Local Path: start with 'file://'
HDFS Path: start with 'hdfs://'
| string | 1.3.0 |
| kyuubi.engine.event.loggers | SPARK | A comma-separated list of engine history loggers, where engine/session/operation etc events go.
SPARK: the events will be written to the Spark listener bus.
JSON: the events will be written to the location of kyuubi.engine.event.json.log.path
JDBC: to be done
CUSTOM: User-defined event handlers.
Note that: Kyuubi supports custom event handlers with the Java SPI. To register a custom event handler, the user needs to implement a subclass of `org.apache.kyuubi.events.handler.CustomEventHandlerProvider` which has a zero-arg constructor. | seq | 1.3.0 |
-| kyuubi.engine.flink.extra.classpath | <undefined> | The extra classpath for the Flink SQL engine, for configuring the location of hadoop client jars, etc | string | 1.6.0 |
-| kyuubi.engine.flink.java.options | <undefined> | The extra Java options for the Flink SQL engine | string | 1.6.0 |
-| kyuubi.engine.flink.memory | 1g | The heap memory for the Flink SQL engine | string | 1.6.0 |
+| kyuubi.engine.flink.application.jars | <undefined> | A comma-separated list of the local jars to be shipped with the job to the cluster. For example, SQL UDF jars. Only effective in yarn application mode. | string | 1.8.0 |
+| kyuubi.engine.flink.extra.classpath | <undefined> | The extra classpath for the Flink SQL engine, for configuring the location of hadoop client jars, etc. Only effective in yarn session mode. | string | 1.6.0 |
+| kyuubi.engine.flink.java.options | <undefined> | The extra Java options for the Flink SQL engine. Only effective in yarn session mode. | string | 1.6.0 |
+| kyuubi.engine.flink.memory | 1g | The heap memory for the Flink SQL engine. Only effective in yarn session mode. | string | 1.6.0 |
| kyuubi.engine.hive.event.loggers | JSON | A comma-separated list of engine history loggers, where engine/session/operation etc events go.
JSON: the events will be written to the location of kyuubi.engine.event.json.log.path
JDBC: to be done
CUSTOM: to be done.
| seq | 1.7.0 |
| kyuubi.engine.hive.extra.classpath | <undefined> | The extra classpath for the Hive query engine, for configuring location of the hadoop client jars and etc. | string | 1.6.0 |
| kyuubi.engine.hive.java.options | <undefined> | The extra Java options for the Hive query engine | string | 1.6.0 |
| kyuubi.engine.hive.memory | 1g | The heap memory for the Hive query engine | string | 1.6.0 |
| kyuubi.engine.initialize.sql | SHOW DATABASES | SemiColon-separated list of SQL statements to be initialized in the newly created engine before queries. i.e. use `SHOW DATABASES` to eagerly active HiveClient. This configuration can not be used in JDBC url due to the limitation of Beeline/JDBC driver. | seq | 1.2.0 |
| kyuubi.engine.jdbc.connection.password | <undefined> | The password is used for connecting to server | string | 1.6.0 |
+| kyuubi.engine.jdbc.connection.propagateCredential | false | Whether to use the session's user and password to connect to database | boolean | 1.8.0 |
| kyuubi.engine.jdbc.connection.properties || The additional properties are used for connecting to server | seq | 1.6.0 |
| kyuubi.engine.jdbc.connection.provider | <undefined> | The connection provider is used for getting a connection from the server | string | 1.6.0 |
| kyuubi.engine.jdbc.connection.url | <undefined> | The server url that engine will connect to | string | 1.6.0 |
| kyuubi.engine.jdbc.connection.user | <undefined> | The user is used for connecting to server | string | 1.6.0 |
| kyuubi.engine.jdbc.driver.class | <undefined> | The driver class for JDBC engine connection | string | 1.6.0 |
| kyuubi.engine.jdbc.extra.classpath | <undefined> | The extra classpath for the JDBC query engine, for configuring the location of the JDBC driver and etc. | string | 1.6.0 |
+| kyuubi.engine.jdbc.initialize.sql | SELECT 1 | SemiColon-separated list of SQL statements to be initialized in the newly created engine before queries. i.e. use `SELECT 1` to eagerly active JDBCClient. | seq | 1.8.0 |
| kyuubi.engine.jdbc.java.options | <undefined> | The extra Java options for the JDBC query engine | string | 1.6.0 |
| kyuubi.engine.jdbc.memory | 1g | The heap memory for the JDBC query engine | string | 1.6.0 |
+| kyuubi.engine.jdbc.session.initialize.sql || SemiColon-separated list of SQL statements to be initialized in the newly created engine session before queries. | seq | 1.8.0 |
| kyuubi.engine.jdbc.type | <undefined> | The short name of JDBC type | string | 1.6.0 |
+| kyuubi.engine.kubernetes.submit.timeout | PT30S | The engine submit timeout for Kubernetes application. | duration | 1.7.2 |
| kyuubi.engine.operation.convert.catalog.database.enabled | true | When set to true, The engine converts the JDBC methods of set/get Catalog and set/get Schema to the implementation of different engines | boolean | 1.6.0 |
| kyuubi.engine.operation.log.dir.root | engine_operation_logs | Root directory for query operation log at engine-side. | string | 1.4.0 |
| kyuubi.engine.pool.name | engine-pool | The name of the engine pool. | string | 1.5.0 |
@@ -170,6 +177,13 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.engine.spark.python.env.archive.exec.path | bin/python | The Python exec path under the Python env archive. | string | 1.7.0 |
| kyuubi.engine.spark.python.home.archive | <undefined> | Spark archive containing $SPARK_HOME/python directory, which is used to init session Python worker for Python language mode. | string | 1.7.0 |
| kyuubi.engine.submit.timeout | PT30S | Period to tolerant Driver Pod ephemerally invisible after submitting. In some Resource Managers, e.g. K8s, the Driver Pod is not visible immediately after `spark-submit` is returned. | duration | 1.7.1 |
+| kyuubi.engine.trino.connection.keystore.password | <undefined> | The keystore password used for connecting to trino cluster | string | 1.8.0 |
+| kyuubi.engine.trino.connection.keystore.path | <undefined> | The keystore path used for connecting to trino cluster | string | 1.8.0 |
+| kyuubi.engine.trino.connection.keystore.type | <undefined> | The keystore type used for connecting to trino cluster | string | 1.8.0 |
+| kyuubi.engine.trino.connection.password | <undefined> | The password used for connecting to trino cluster | string | 1.8.0 |
+| kyuubi.engine.trino.connection.truststore.password | <undefined> | The truststore password used for connecting to trino cluster | string | 1.8.0 |
+| kyuubi.engine.trino.connection.truststore.path | <undefined> | The truststore path used for connecting to trino cluster | string | 1.8.0 |
+| kyuubi.engine.trino.connection.truststore.type | <undefined> | The truststore type used for connecting to trino cluster | string | 1.8.0 |
| kyuubi.engine.trino.event.loggers | JSON | A comma-separated list of engine history loggers, where engine/session/operation etc events go.
JSON: the events will be written to the location of kyuubi.engine.event.json.log.path
JDBC: to be done
CUSTOM: to be done.
| seq | 1.7.0 |
| kyuubi.engine.trino.extra.classpath | <undefined> | The extra classpath for the Trino query engine, for configuring other libs which may need by the Trino engine | string | 1.6.0 |
| kyuubi.engine.trino.java.options | <undefined> | The extra Java options for the Trino query engine | string | 1.6.0 |
@@ -181,6 +195,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.engine.user.isolated.spark.session | true | When set to false, if the engine is running in a group or server share level, all the JDBC/ODBC connections will be isolated against the user. Including the temporary views, function registries, SQL configuration, and the current database. Note that, it does not affect if the share level is connection or user. | boolean | 1.6.0 |
| kyuubi.engine.user.isolated.spark.session.idle.interval | PT1M | The interval to check if the user-isolated Spark session is timeout. | duration | 1.6.0 |
| kyuubi.engine.user.isolated.spark.session.idle.timeout | PT6H | If kyuubi.engine.user.isolated.spark.session is false, we will release the Spark session if its corresponding user is inactive after this configured timeout. | duration | 1.6.0 |
+| kyuubi.engine.yarn.submit.timeout | PT30S | The engine submit timeout for YARN application. | duration | 1.7.2 |
### Event
@@ -194,6 +209,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| Key | Default | Meaning | Type | Since |
|--------------------------------------------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
+| kyuubi.frontend.advertised.host | <undefined> | Hostname or IP of the Kyuubi server's frontend services to publish to external systems such as the service discovery ensemble and metadata store. Use it when you want to advertise a different hostname or IP than the bind host. | string | 1.8.0 |
| kyuubi.frontend.backoff.slot.length | PT0.1S | (deprecated) Time to back off during login to the thrift frontend service. | duration | 1.0.0 |
| kyuubi.frontend.bind.host | <undefined> | Hostname or IP of the machine on which to run the frontend services. | string | 1.0.0 |
| kyuubi.frontend.bind.port | 10009 | (deprecated) Port of the machine on which to run the thrift frontend service via the binary protocol. | int | 1.0.0 |
@@ -220,7 +236,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.frontend.thrift.backoff.slot.length | PT0.1S | Time to back off during login to the thrift frontend service. | duration | 1.4.0 |
| kyuubi.frontend.thrift.binary.bind.host | <undefined> | Hostname or IP of the machine on which to run the thrift frontend service via the binary protocol. | string | 1.4.0 |
| kyuubi.frontend.thrift.binary.bind.port | 10009 | Port of the machine on which to run the thrift frontend service via the binary protocol. | int | 1.4.0 |
-| kyuubi.frontend.thrift.binary.ssl.disallowed.protocols | SSLv2,SSLv3 | SSL versions to disallow for Kyuubi thrift binary frontend. | seq | 1.7.0 |
+| kyuubi.frontend.thrift.binary.ssl.disallowed.protocols | SSLv2,SSLv3 | SSL versions to disallow for Kyuubi thrift binary frontend. | set | 1.7.0 |
| kyuubi.frontend.thrift.binary.ssl.enabled | false | Set this to true for using SSL encryption in thrift binary frontend server. | boolean | 1.7.0 |
| kyuubi.frontend.thrift.binary.ssl.include.ciphersuites || A comma-separated list of include SSL cipher suite names for thrift binary frontend. | seq | 1.7.0 |
| kyuubi.frontend.thrift.http.allow.user.substitution | true | Allow alternate user to be specified as part of open connection request when using HTTP transport mode. | boolean | 1.6.0 |
@@ -254,32 +270,33 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
### Ha
-| Key | Default | Meaning | Type | Since |
-|------------------------------------------------|----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
-| kyuubi.ha.addresses || The connection string for the discovery ensemble | string | 1.6.0 |
-| kyuubi.ha.client.class | org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient | Class name for service discovery client.
| string | 1.6.0 |
-| kyuubi.ha.etcd.lease.timeout | PT10S | Timeout for etcd keep alive lease. The kyuubi server will know the unexpected loss of engine after up to this seconds. | duration | 1.6.0 |
-| kyuubi.ha.etcd.ssl.ca.path | <undefined> | Where the etcd CA certificate file is stored. | string | 1.6.0 |
-| kyuubi.ha.etcd.ssl.client.certificate.path | <undefined> | Where the etcd SSL certificate file is stored. | string | 1.6.0 |
-| kyuubi.ha.etcd.ssl.client.key.path | <undefined> | Where the etcd SSL key file is stored. | string | 1.6.0 |
-| kyuubi.ha.etcd.ssl.enabled | false | When set to true, will build an SSL secured etcd client. | boolean | 1.6.0 |
-| kyuubi.ha.namespace | kyuubi | The root directory for the service to deploy its instance uri | string | 1.6.0 |
-| kyuubi.ha.zookeeper.acl.enabled | false | Set to true if the ZooKeeper ensemble is kerberized | boolean | 1.0.0 |
-| kyuubi.ha.zookeeper.auth.digest | <undefined> | The digest auth string is used for ZooKeeper authentication, like: username:password. | string | 1.3.2 |
-| kyuubi.ha.zookeeper.auth.keytab | <undefined> | Location of the Kyuubi server's keytab is used for ZooKeeper authentication. | string | 1.3.2 |
-| kyuubi.ha.zookeeper.auth.principal | <undefined> | Name of the Kerberos principal is used for ZooKeeper authentication. | string | 1.3.2 |
-| kyuubi.ha.zookeeper.auth.type | NONE | The type of ZooKeeper authentication, all candidates are
NONE
KERBEROS
DIGEST
| string | 1.3.2 |
-| kyuubi.ha.zookeeper.connection.base.retry.wait | 1000 | Initial amount of time to wait between retries to the ZooKeeper ensemble | int | 1.0.0 |
-| kyuubi.ha.zookeeper.connection.max.retries | 3 | Max retry times for connecting to the ZooKeeper ensemble | int | 1.0.0 |
-| kyuubi.ha.zookeeper.connection.max.retry.wait | 30000 | Max amount of time to wait between retries for BOUNDED_EXPONENTIAL_BACKOFF policy can reach, or max time until elapsed for UNTIL_ELAPSED policy to connect the zookeeper ensemble | int | 1.0.0 |
-| kyuubi.ha.zookeeper.connection.retry.policy | EXPONENTIAL_BACKOFF | The retry policy for connecting to the ZooKeeper ensemble, all candidates are:
ONE_TIME
N_TIME
EXPONENTIAL_BACKOFF
BOUNDED_EXPONENTIAL_BACKOFF
UNTIL_ELAPSED
| string | 1.0.0 |
-| kyuubi.ha.zookeeper.connection.timeout | 15000 | The timeout(ms) of creating the connection to the ZooKeeper ensemble | int | 1.0.0 |
-| kyuubi.ha.zookeeper.engine.auth.type | NONE | The type of ZooKeeper authentication for the engine, all candidates are
NONE
KERBEROS
DIGEST
| string | 1.3.2 |
-| kyuubi.ha.zookeeper.namespace | kyuubi | (deprecated) The root directory for the service to deploy its instance uri | string | 1.0.0 |
-| kyuubi.ha.zookeeper.node.creation.timeout | PT2M | Timeout for creating ZooKeeper node | duration | 1.2.0 |
-| kyuubi.ha.zookeeper.publish.configs | false | When set to true, publish Kerberos configs to Zookeeper. Note that the Hive driver needs to be greater than 1.3 or 2.0 or apply HIVE-11581 patch. | boolean | 1.4.0 |
-| kyuubi.ha.zookeeper.quorum || (deprecated) The connection string for the ZooKeeper ensemble | string | 1.0.0 |
-| kyuubi.ha.zookeeper.session.timeout | 60000 | The timeout(ms) of a connected session to be idled | int | 1.0.0 |
+| Key | Default | Meaning | Type | Since |
+|------------------------------------------------|----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
+| kyuubi.ha.addresses || The connection string for the discovery ensemble | string | 1.6.0 |
+| kyuubi.ha.client.class | org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient | Class name for service discovery client.
| string | 1.6.0 |
+| kyuubi.ha.etcd.lease.timeout | PT10S | Timeout for etcd keep alive lease. The kyuubi server will know the unexpected loss of engine after up to this seconds. | duration | 1.6.0 |
+| kyuubi.ha.etcd.ssl.ca.path | <undefined> | Where the etcd CA certificate file is stored. | string | 1.6.0 |
+| kyuubi.ha.etcd.ssl.client.certificate.path | <undefined> | Where the etcd SSL certificate file is stored. | string | 1.6.0 |
+| kyuubi.ha.etcd.ssl.client.key.path | <undefined> | Where the etcd SSL key file is stored. | string | 1.6.0 |
+| kyuubi.ha.etcd.ssl.enabled | false | When set to true, will build an SSL secured etcd client. | boolean | 1.6.0 |
+| kyuubi.ha.namespace | kyuubi | The root directory for the service to deploy its instance uri | string | 1.6.0 |
+| kyuubi.ha.zookeeper.acl.enabled | false | Set to true if the ZooKeeper ensemble is kerberized | boolean | 1.0.0 |
+| kyuubi.ha.zookeeper.auth.digest | <undefined> | The digest auth string is used for ZooKeeper authentication, like: username:password. | string | 1.3.2 |
+| kyuubi.ha.zookeeper.auth.keytab | <undefined> | Location of the Kyuubi server's keytab that is used for ZooKeeper authentication. | string | 1.3.2 |
+| kyuubi.ha.zookeeper.auth.principal | <undefined> | Kerberos principal name that is used for ZooKeeper authentication. | string | 1.3.2 |
+| kyuubi.ha.zookeeper.auth.serverPrincipal | <undefined> | Kerberos principal name of ZooKeeper Server. It only takes effect when Zookeeper client's version at least 3.5.7 or 3.6.0 or applies ZOOKEEPER-1467. To use Zookeeper 3.6 client, compile Kyuubi with `-Pzookeeper-3.6`. | string | 1.8.0 |
+| kyuubi.ha.zookeeper.auth.type | NONE | The type of ZooKeeper authentication, all candidates are
NONE
KERBEROS
DIGEST
| string | 1.3.2 |
+| kyuubi.ha.zookeeper.connection.base.retry.wait | 1000 | Initial amount of time to wait between retries to the ZooKeeper ensemble | int | 1.0.0 |
+| kyuubi.ha.zookeeper.connection.max.retries | 3 | Max retry times for connecting to the ZooKeeper ensemble | int | 1.0.0 |
+| kyuubi.ha.zookeeper.connection.max.retry.wait | 30000 | Max amount of time to wait between retries for BOUNDED_EXPONENTIAL_BACKOFF policy can reach, or max time until elapsed for UNTIL_ELAPSED policy to connect the zookeeper ensemble | int | 1.0.0 |
+| kyuubi.ha.zookeeper.connection.retry.policy | EXPONENTIAL_BACKOFF | The retry policy for connecting to the ZooKeeper ensemble, all candidates are:
ONE_TIME
N_TIME
EXPONENTIAL_BACKOFF
BOUNDED_EXPONENTIAL_BACKOFF
UNTIL_ELAPSED
| string | 1.0.0 |
+| kyuubi.ha.zookeeper.connection.timeout | 15000 | The timeout(ms) of creating the connection to the ZooKeeper ensemble | int | 1.0.0 |
+| kyuubi.ha.zookeeper.engine.auth.type | NONE | The type of ZooKeeper authentication for the engine, all candidates are
NONE
KERBEROS
DIGEST
| string | 1.3.2 |
+| kyuubi.ha.zookeeper.namespace | kyuubi | (deprecated) The root directory for the service to deploy its instance uri | string | 1.0.0 |
+| kyuubi.ha.zookeeper.node.creation.timeout | PT2M | Timeout for creating ZooKeeper node | duration | 1.2.0 |
+| kyuubi.ha.zookeeper.publish.configs | false | When set to true, publish Kerberos configs to Zookeeper. Note that the Hive driver needs to be greater than 1.3 or 2.0 or apply HIVE-11581 patch. | boolean | 1.4.0 |
+| kyuubi.ha.zookeeper.quorum || (deprecated) The connection string for the ZooKeeper ensemble | string | 1.0.0 |
+| kyuubi.ha.zookeeper.session.timeout | 60000 | The timeout(ms) of a connected session to be idled | int | 1.0.0 |
### Kinit
@@ -300,30 +317,38 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.kubernetes.authenticate.oauthToken | <undefined> | The OAuth token to use when authenticating against the Kubernetes API server. Note that unlike, the other authentication options, this must be the exact string value of the token to use for the authentication. | string | 1.7.0 |
| kyuubi.kubernetes.authenticate.oauthTokenFile | <undefined> | Path to the file containing the OAuth token to use when authenticating against the Kubernetes API server. Specify this as a path as opposed to a URI (i.e. do not provide a scheme) | string | 1.7.0 |
| kyuubi.kubernetes.context | <undefined> | The desired context from your kubernetes config file used to configure the K8s client for interacting with the cluster. | string | 1.6.0 |
+| kyuubi.kubernetes.context.allow.list || The allowed kubernetes context list, if it is empty, there is no kubernetes context limitation. | set | 1.8.0 |
| kyuubi.kubernetes.master.address | <undefined> | The internal Kubernetes master (API server) address to be used for kyuubi. | string | 1.7.0 |
| kyuubi.kubernetes.namespace | default | The namespace that will be used for running the kyuubi pods and find engines. | string | 1.7.0 |
+| kyuubi.kubernetes.namespace.allow.list || The allowed kubernetes namespace list, if it is empty, there is no kubernetes namespace limitation. | set | 1.8.0 |
| kyuubi.kubernetes.terminatedApplicationRetainPeriod | PT5M | The period for which the Kyuubi server retains application information after the application terminates. | duration | 1.7.1 |
| kyuubi.kubernetes.trust.certificates | false | If set to true then client can submit to kubernetes cluster only with token | boolean | 1.7.0 |
+### Lineage
+
+| Key | Default | Meaning | Type | Since |
+|---------------------------------------|--------------------------------------------------------|---------------------------------------------------|--------|-------|
+| kyuubi.lineage.parser.plugin.provider | org.apache.kyuubi.plugin.lineage.LineageParserProvider | The provider for the Spark lineage parser plugin. | string | 1.8.0 |
+
### Metadata
-| Key | Default | Meaning | Type | Since |
-|-------------------------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
-| kyuubi.metadata.cleaner.enabled | true | Whether to clean the metadata periodically. If it is enabled, Kyuubi will clean the metadata that is in the terminate state with max age limitation. | boolean | 1.6.0 |
-| kyuubi.metadata.cleaner.interval | PT30M | The interval to check and clean expired metadata. | duration | 1.6.0 |
-| kyuubi.metadata.max.age | PT72H | The maximum age of metadata, the metadata exceeding the age will be cleaned. | duration | 1.6.0 |
-| kyuubi.metadata.recovery.threads | 10 | The number of threads for recovery from the metadata store when the Kyuubi server restarts. | int | 1.6.0 |
-| kyuubi.metadata.request.async.retry.enabled | true | Whether to retry in async when metadata request failed. When true, return success response immediately even the metadata request failed, and schedule it in background until success, to tolerate long-time metadata store outages w/o blocking the submission request. | boolean | 1.7.0 |
-| kyuubi.metadata.request.async.retry.queue.size | 65536 | The maximum queue size for buffering metadata requests in memory when the external metadata storage is down. Requests will be dropped if the queue exceeds. Only take affect when kyuubi.metadata.request.async.retry.enabled is `true`. | int | 1.6.0 |
-| kyuubi.metadata.request.async.retry.threads | 10 | Number of threads in the metadata request async retry manager thread pool. Only take affect when kyuubi.metadata.request.async.retry.enabled is `true`. | int | 1.6.0 |
-| kyuubi.metadata.request.retry.interval | PT5S | The interval to check and trigger the metadata request retry tasks. | duration | 1.6.0 |
-| kyuubi.metadata.store.class | org.apache.kyuubi.server.metadata.jdbc.JDBCMetadataStore | Fully qualified class name for server metadata store. | string | 1.6.0 |
-| kyuubi.metadata.store.jdbc.database.schema.init | true | Whether to init the JDBC metadata store database schema. | boolean | 1.6.0 |
-| kyuubi.metadata.store.jdbc.database.type | DERBY | The database type for server jdbc metadata store.
CUSTOM: User-defined database type, need to specify corresponding JDBC driver.
Note that: The JDBC datasource is powered by HiKariCP, for datasource properties, please specify them with the prefix: kyuubi.metadata.store.jdbc.datasource. For example, kyuubi.metadata.store.jdbc.datasource.connectionTimeout=10000. | string | 1.6.0 |
-| kyuubi.metadata.store.jdbc.driver | <undefined> | JDBC driver class name for server jdbc metadata store. | string | 1.6.0 |
-| kyuubi.metadata.store.jdbc.password || The password for server JDBC metadata store. | string | 1.6.0 |
-| kyuubi.metadata.store.jdbc.url | jdbc:derby:memory:kyuubi_state_store_db;create=true | The JDBC url for server JDBC metadata store. By default, it is a DERBY in-memory database url, and the state information is not shared across kyuubi instances. To enable high availability for multiple kyuubi instances, please specify a production JDBC url. | string | 1.6.0 |
-| kyuubi.metadata.store.jdbc.user || The username for server JDBC metadata store. | string | 1.6.0 |
+| Key | Default | Meaning | Type | Since |
+|-------------------------------------------------|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
+| kyuubi.metadata.cleaner.enabled | true | Whether to clean the metadata periodically. If it is enabled, Kyuubi will clean the metadata that is in the terminate state with max age limitation. | boolean | 1.6.0 |
+| kyuubi.metadata.cleaner.interval | PT30M | The interval to check and clean expired metadata. | duration | 1.6.0 |
+| kyuubi.metadata.max.age | PT72H | The maximum age of metadata, the metadata exceeding the age will be cleaned. | duration | 1.6.0 |
+| kyuubi.metadata.recovery.threads | 10 | The number of threads for recovery from the metadata store when the Kyuubi server restarts. | int | 1.6.0 |
+| kyuubi.metadata.request.async.retry.enabled | true | Whether to retry in async when metadata request failed. When true, return success response immediately even the metadata request failed, and schedule it in background until success, to tolerate long-time metadata store outages w/o blocking the submission request. | boolean | 1.7.0 |
+| kyuubi.metadata.request.async.retry.queue.size | 65536 | The maximum queue size for buffering metadata requests in memory when the external metadata storage is down. Requests will be dropped if the queue exceeds. Only take affect when kyuubi.metadata.request.async.retry.enabled is `true`. | int | 1.6.0 |
+| kyuubi.metadata.request.async.retry.threads | 10 | Number of threads in the metadata request async retry manager thread pool. Only take affect when kyuubi.metadata.request.async.retry.enabled is `true`. | int | 1.6.0 |
+| kyuubi.metadata.request.retry.interval | PT5S | The interval to check and trigger the metadata request retry tasks. | duration | 1.6.0 |
+| kyuubi.metadata.store.class | org.apache.kyuubi.server.metadata.jdbc.JDBCMetadataStore | Fully qualified class name for server metadata store. | string | 1.6.0 |
+| kyuubi.metadata.store.jdbc.database.schema.init | true | Whether to init the JDBC metadata store database schema. | boolean | 1.6.0 |
+| kyuubi.metadata.store.jdbc.database.type | SQLITE | The database type for server jdbc metadata store.
CUSTOM: User-defined database type, need to specify corresponding JDBC driver.
Note that: The JDBC datasource is powered by HiKariCP, for datasource properties, please specify them with the prefix: kyuubi.metadata.store.jdbc.datasource. For example, kyuubi.metadata.store.jdbc.datasource.connectionTimeout=10000. | string | 1.6.0 |
+| kyuubi.metadata.store.jdbc.driver | <undefined> | JDBC driver class name for server jdbc metadata store. | string | 1.6.0 |
+| kyuubi.metadata.store.jdbc.password || The password for server JDBC metadata store. | string | 1.6.0 |
+| kyuubi.metadata.store.jdbc.url | jdbc:sqlite:kyuubi_state_store.db | The JDBC url for server JDBC metadata store. By default, it is a SQLite database url, and the state information is not shared across kyuubi instances. To enable high availability for multiple kyuubi instances, please specify a production JDBC url. | string | 1.6.0 |
+| kyuubi.metadata.store.jdbc.user || The username for server JDBC metadata store. | string | 1.6.0 |
### Metrics
@@ -335,7 +360,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.metrics.json.location | metrics | Where the JSON metrics file located | string | 1.2.0 |
| kyuubi.metrics.prometheus.path | /metrics | URI context path of prometheus metrics HTTP server | string | 1.2.0 |
| kyuubi.metrics.prometheus.port | 10019 | Prometheus metrics HTTP server port | int | 1.2.0 |
-| kyuubi.metrics.reporters | JSON | A comma-separated list for all metrics reporters
CONSOLE - ConsoleReporter which outputs measurements to CONSOLE periodically.
JMX - JmxReporter which listens for new metrics and exposes them as MBeans.
JSON - JsonReporter which outputs measurements to json file periodically.
PROMETHEUS - PrometheusReporter which exposes metrics in Prometheus format.
SLF4J - Slf4jReporter which outputs measurements to system log periodically.
| seq | 1.2.0 |
+| kyuubi.metrics.reporters | JSON | A comma-separated list for all metrics reporters
CONSOLE - ConsoleReporter which outputs measurements to CONSOLE periodically.
JMX - JmxReporter which listens for new metrics and exposes them as MBeans.
JSON - JsonReporter which outputs measurements to json file periodically.
PROMETHEUS - PrometheusReporter which exposes metrics in Prometheus format.
SLF4J - Slf4jReporter which outputs measurements to system log periodically.
| set | 1.2.0 |
| kyuubi.metrics.slf4j.interval | PT5S | How often should report metrics to SLF4J logger | duration | 1.2.0 |
### Operation
@@ -347,8 +372,8 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.operation.interrupt.on.cancel | true | When true, all running tasks will be interrupted if one cancels a query. When false, all running tasks will remain until finished. | boolean | 1.2.0 |
| kyuubi.operation.language | SQL | Choose a programing language for the following inputs
SQL: (Default) Run all following statements as SQL queries.
SCALA: Run all following input as scala codes
PYTHON: (Experimental) Run all following input as Python codes with Spark engine
| string | 1.5.0 |
| kyuubi.operation.log.dir.root | server_operation_logs | Root directory for query operation log at server-side. | string | 1.4.0 |
-| kyuubi.operation.plan.only.excludes | ResetCommand,SetCommand,SetNamespaceCommand,UseStatement,SetCatalogAndNamespace | Comma-separated list of query plan names, in the form of simple class names, i.e, for `SET abc=xyz`, the value will be `SetCommand`. For those auxiliary plans, such as `switch databases`, `set properties`, or `create temporary view` etc., which are used for setup evaluating environments for analyzing actual queries, we can use this config to exclude them and let them take effect. See also kyuubi.operation.plan.only.mode. | seq | 1.5.0 |
-| kyuubi.operation.plan.only.mode | none | Configures the statement performed mode, The value can be 'parse', 'analyze', 'optimize', 'optimize_with_stats', 'physical', 'execution', or 'none', when it is 'none', indicate to the statement will be fully executed, otherwise only way without executing the query. different engines currently support different modes, the Spark engine supports all modes, and the Flink engine supports 'parse', 'physical', and 'execution', other engines do not support planOnly currently. | string | 1.4.0 |
+| kyuubi.operation.plan.only.excludes | SetCatalogAndNamespace,UseStatement,SetNamespaceCommand,SetCommand,ResetCommand | Comma-separated list of query plan names, in the form of simple class names, i.e, for `SET abc=xyz`, the value will be `SetCommand`. For those auxiliary plans, such as `switch databases`, `set properties`, or `create temporary view` etc., which are used for setup evaluating environments for analyzing actual queries, we can use this config to exclude them and let them take effect. See also kyuubi.operation.plan.only.mode. | set | 1.5.0 |
+| kyuubi.operation.plan.only.mode | none | Configures the statement performed mode, The value can be 'parse', 'analyze', 'optimize', 'optimize_with_stats', 'physical', 'execution', 'lineage' or 'none', when it is 'none', indicate to the statement will be fully executed, otherwise only way without executing the query. different engines currently support different modes, the Spark engine supports all modes, and the Flink engine supports 'parse', 'physical', and 'execution', other engines do not support planOnly currently. | string | 1.4.0 |
| kyuubi.operation.plan.only.output.style | plain | Configures the planOnly output style. The value can be 'plain' or 'json', and the default value is 'plain'. This configuration supports only the output styles of the Spark engine | string | 1.7.0 |
| kyuubi.operation.progress.enabled | false | Whether to enable the operation progress. When true, the operation progress will be returned in `GetOperationStatus`. | boolean | 1.6.0 |
| kyuubi.operation.query.timeout | <undefined> | Timeout for query executions at server-side, take effect with client-side timeout(`java.sql.Statement.setQueryTimeout`) together, a running query will be cancelled automatically if timeout. It's off by default, which means only client-side take full control of whether the query should timeout or not. If set, client-side timeout is capped at this point. To cancel the queries right away without waiting for task to finish, consider enabling kyuubi.operation.interrupt.on.cancel together. | duration | 1.2.0 |
@@ -363,7 +388,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| Key | Default | Meaning | Type | Since |
|----------------------------------------------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|
-| kyuubi.server.administrators || Comma-separated list of Kyuubi service administrators. We use this config to grant admin permission to any service accounts. | seq | 1.8.0 |
+| kyuubi.server.administrators || Comma-separated list of Kyuubi service administrators. We use this config to grant admin permission to any service accounts. | set | 1.8.0 |
| kyuubi.server.info.provider | ENGINE | The server information provider name, some clients may rely on this information to check the server compatibilities and functionalities.
SERVER: Return Kyuubi server information.
ENGINE: Return Kyuubi engine information.
| string | 1.6.1 |
| kyuubi.server.limit.batch.connections.per.ipaddress | <undefined> | Maximum kyuubi server batch connections per ipaddress. Any user exceeding this limit will not be allowed to connect. | int | 1.7.0 |
| kyuubi.server.limit.batch.connections.per.user | <undefined> | Maximum kyuubi server batch connections per user. Any user exceeding this limit will not be allowed to connect. | int | 1.7.0 |
@@ -372,7 +397,8 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.server.limit.connections.per.ipaddress | <undefined> | Maximum kyuubi server connections per ipaddress. Any user exceeding this limit will not be allowed to connect. | int | 1.6.0 |
| kyuubi.server.limit.connections.per.user | <undefined> | Maximum kyuubi server connections per user. Any user exceeding this limit will not be allowed to connect. | int | 1.6.0 |
| kyuubi.server.limit.connections.per.user.ipaddress | <undefined> | Maximum kyuubi server connections per user:ipaddress combination. Any user-ipaddress exceeding this limit will not be allowed to connect. | int | 1.6.0 |
-| kyuubi.server.limit.connections.user.unlimited.list || The maximum connections of the user in the white list will not be limited. | seq | 1.7.0 |
+| kyuubi.server.limit.connections.user.deny.list || The user in the deny list will be denied to connect to kyuubi server, if the user has configured both user.unlimited.list and user.deny.list, the priority of the latter is higher. | set | 1.8.0 |
+| kyuubi.server.limit.connections.user.unlimited.list || The maximum connections of the user in the white list will not be limited. | set | 1.7.0 |
| kyuubi.server.name | <undefined> | The name of Kyuubi Server. | string | 1.5.0 |
| kyuubi.server.periodicGC.interval | PT30M | How often to trigger a garbage collection. | duration | 1.7.0 |
| kyuubi.server.redaction.regex | <undefined> | Regex to decide which Kyuubi contain sensitive information. When this regex matches a property key or value, the value is redacted from the various logs. || 1.6.0 |
@@ -385,13 +411,15 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.session.close.on.disconnect | true | Session will be closed when client disconnects from kyuubi gateway. Set this to false to have session outlive its parent connection. | boolean | 1.8.0 |
| kyuubi.session.conf.advisor | <undefined> | A config advisor plugin for Kyuubi Server. This plugin can provide some custom configs for different users or session configs and overwrite the session configs before opening a new session. This config value should be a subclass of `org.apache.kyuubi.plugin.SessionConfAdvisor` which has a zero-arg constructor. | string | 1.5.0 |
| kyuubi.session.conf.file.reload.interval | PT10M | When `FileSessionConfAdvisor` is used, this configuration defines the expired time of `$KYUUBI_CONF_DIR/kyuubi-session-.conf` in the cache. After exceeding this value, the file will be reloaded. | duration | 1.7.0 |
-| kyuubi.session.conf.ignore.list || A comma-separated list of ignored keys. If the client connection contains any of them, the key and the corresponding value will be removed silently during engine bootstrap and connection setup. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering but will not forbid users to set dynamic configurations via SET syntax. | seq | 1.2.0 |
+| kyuubi.session.conf.ignore.list || A comma-separated list of ignored keys. If the client connection contains any of them, the key and the corresponding value will be removed silently during engine bootstrap and connection setup. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering but will not forbid users to set dynamic configurations via SET syntax. | set | 1.2.0 |
| kyuubi.session.conf.profile | <undefined> | Specify a profile to load session-level configurations from `$KYUUBI_CONF_DIR/kyuubi-session-.conf`. This configuration will be ignored if the file does not exist. This configuration only takes effect when `kyuubi.session.conf.advisor` is set as `org.apache.kyuubi.session.FileSessionConfAdvisor`. | string | 1.7.0 |
-| kyuubi.session.conf.restrict.list || A comma-separated list of restricted keys. If the client connection contains any of them, the connection will be rejected explicitly during engine bootstrap and connection setup. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering but will not forbid users to set dynamic configurations via SET syntax. | seq | 1.2.0 |
+| kyuubi.session.conf.restrict.list || A comma-separated list of restricted keys. If the client connection contains any of them, the connection will be rejected explicitly during engine bootstrap and connection setup. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering but will not forbid users to set dynamic configurations via SET syntax. | set | 1.2.0 |
+| kyuubi.session.engine.alive.max.failures | 3 | The maximum number of failures allowed for the engine. | int | 1.8.0 |
| kyuubi.session.engine.alive.probe.enabled | false | Whether to enable the engine alive probe, it true, we will create a companion thrift client that keeps sending simple requests to check whether the engine is alive. | boolean | 1.6.0 |
| kyuubi.session.engine.alive.probe.interval | PT10S | The interval for engine alive probe. | duration | 1.6.0 |
| kyuubi.session.engine.alive.timeout | PT2M | The timeout for engine alive. If there is no alive probe success in the last timeout window, the engine will be marked as no-alive. | duration | 1.6.0 |
| kyuubi.session.engine.check.interval | PT1M | The check interval for engine timeout | duration | 1.0.0 |
+| kyuubi.session.engine.flink.fetch.timeout | <undefined> | Result fetch timeout for Flink engine. If the timeout is reached, the result fetch would be stopped and the current fetched would be returned. If no data are fetched, a TimeoutException would be thrown. | duration | 1.8.0 |
| kyuubi.session.engine.flink.main.resource | <undefined> | The package used to create Flink SQL engine remote job. If it is undefined, Kyuubi will use the default | string | 1.4.0 |
| kyuubi.session.engine.flink.max.rows | 1000000 | Max rows of Flink query results. For batch queries, rows exceeding the limit would be ignored. For streaming queries, the query would be canceled if the limit is reached. | int | 1.5.0 |
| kyuubi.session.engine.hive.main.resource | <undefined> | The package used to create Hive engine remote job. If it is undefined, Kyuubi will use the default | string | 1.6.0 |
@@ -404,10 +432,12 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.session.engine.open.retry.wait | PT10S | How long to wait before retrying to open the engine after failure. | duration | 1.7.0 |
| kyuubi.session.engine.share.level | USER | (deprecated) - Using kyuubi.engine.share.level instead | string | 1.0.0 |
| kyuubi.session.engine.spark.main.resource | <undefined> | The package used to create Spark SQL engine remote application. If it is undefined, Kyuubi will use the default | string | 1.0.0 |
+| kyuubi.session.engine.spark.max.initial.wait | PT1M | Max wait time for the initial connection to Spark engine. The engine will self-terminate no new incoming connection is established within this time. This setting only applies at the CONNECTION share level. 0 or negative means not to self-terminate. | duration | 1.8.0 |
| kyuubi.session.engine.spark.max.lifetime | PT0S | Max lifetime for Spark engine, the engine will self-terminate when it reaches the end of life. 0 or negative means not to self-terminate. | duration | 1.6.0 |
| kyuubi.session.engine.spark.progress.timeFormat | yyyy-MM-dd HH:mm:ss.SSS | The time format of the progress bar | string | 1.6.0 |
| kyuubi.session.engine.spark.progress.update.interval | PT1S | Update period of progress bar. | duration | 1.6.0 |
| kyuubi.session.engine.spark.showProgress | false | When true, show the progress bar in the Spark's engine log. | boolean | 1.6.0 |
+| kyuubi.session.engine.startup.destroy.timeout | PT5S | Engine startup process destroy wait time, if the process does not stop after this time, force destroy instead. This configuration only takes effect when `kyuubi.session.engine.startup.waitCompletion=false`. | duration | 1.8.0 |
| kyuubi.session.engine.startup.error.max.size | 8192 | During engine bootstrapping, if anderror occurs, using this config to limit the length of error message(characters). | int | 1.1.0 |
| kyuubi.session.engine.startup.maxLogLines | 10 | The maximum number of engine log lines when errors occur during the engine startup phase. Note that this config effects on client-side to help track engine startup issues. | int | 1.4.0 |
| kyuubi.session.engine.startup.waitCompletion | true | Whether to wait for completion after the engine starts. If false, the startup process will be destroyed after the engine is started. Note that only use it when the driver is not running locally, such as in yarn-cluster mode; Otherwise, the engine will be killed. | boolean | 1.5.0 |
@@ -418,7 +448,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.session.engine.trino.showProgress.debug | false | When true, show the progress debug info in the Trino engine log. | boolean | 1.6.0 |
| kyuubi.session.group.provider | hadoop | A group provider plugin for Kyuubi Server. This plugin can provide primary group and groups information for different users or session configs. This config value should be a subclass of `org.apache.kyuubi.plugin.GroupProvider` which has a zero-arg constructor. Kyuubi provides the following built-in implementations:
hadoop: delegate the user group mapping to hadoop UserGroupInformation.
| string | 1.7.0 |
| kyuubi.session.idle.timeout | PT6H | session idle timeout, it will be closed when it's not accessed for this duration | duration | 1.2.0 |
-| kyuubi.session.local.dir.allow.list || The local dir list that are allowed to access by the kyuubi session application. End-users might set some parameters such as `spark.files` and it will upload some local files when launching the kyuubi engine, if the local dir allow list is defined, kyuubi will check whether the path to upload is in the allow list. Note that, if it is empty, there is no limitation for that. And please use absolute paths. | seq | 1.6.0 |
+| kyuubi.session.local.dir.allow.list || The local dir list that are allowed to access by the kyuubi session application. End-users might set some parameters such as `spark.files` and it will upload some local files when launching the kyuubi engine, if the local dir allow list is defined, kyuubi will check whether the path to upload is in the allow list. Note that, if it is empty, there is no limitation for that. And please use absolute paths. | set | 1.6.0 |
| kyuubi.session.name | <undefined> | A human readable name of the session and we use empty string by default. This name will be recorded in the event. Note that, we only apply this value from session conf. | string | 1.4.0 |
| kyuubi.session.timeout | PT6H | (deprecated)session timeout, it will be closed when it's not accessed for this duration | duration | 1.0.0 |
| kyuubi.session.user.sign.enabled | false | Whether to verify the integrity of session user name on the engine side, e.g. Authz plugin in Spark. | boolean | 1.7.0 |
@@ -430,20 +460,28 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.spnego.keytab | <undefined> | Keytab file for SPNego principal | string | 1.6.0 |
| kyuubi.spnego.principal | <undefined> | SPNego service principal, typical value would look like HTTP/_HOST@EXAMPLE.COM. SPNego service principal would be used when restful Kerberos security is enabled. This needs to be set only if SPNEGO is to be used in authentication. | string | 1.6.0 |
+### Yarn
+
+| Key | Default | Meaning | Type | Since |
+|---------------------------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------|
+| kyuubi.yarn.user.admin | yarn | When kyuubi.yarn.user.strategy is set to ADMIN, use this admin user to construct YARN client for application management, e.g. kill application. | string | 1.8.0 |
+| kyuubi.yarn.user.strategy | NONE | Determine which user to use to construct YARN client for application management, e.g. kill application. Options:
NONE: use Kyuubi server user.
ADMIN: use admin user configured in `kyuubi.yarn.user.admin`.
OWNER: use session user, typically is application owner.
| string | 1.8.0 |
+
### Zookeeper
-| Key | Default | Meaning | Type | Since |
-|--------------------------------------------------|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------|
-| kyuubi.zookeeper.embedded.client.port | 2181 | clientPort for the embedded ZooKeeper server to listen for client connections, a client here could be Kyuubi server, engine, and JDBC client | int | 1.2.0 |
-| kyuubi.zookeeper.embedded.client.port.address | <undefined> | clientPortAddress for the embedded ZooKeeper server to | string | 1.2.0 |
-| kyuubi.zookeeper.embedded.data.dir | embedded_zookeeper | dataDir for the embedded zookeeper server where stores the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database. | string | 1.2.0 |
-| kyuubi.zookeeper.embedded.data.log.dir | embedded_zookeeper | dataLogDir for the embedded ZooKeeper server where writes the transaction log . | string | 1.2.0 |
-| kyuubi.zookeeper.embedded.directory | embedded_zookeeper | The temporary directory for the embedded ZooKeeper server | string | 1.0.0 |
-| kyuubi.zookeeper.embedded.max.client.connections | 120 | maxClientCnxns for the embedded ZooKeeper server to limit the number of concurrent connections of a single client identified by IP address | int | 1.2.0 |
-| kyuubi.zookeeper.embedded.max.session.timeout | 60000 | maxSessionTimeout in milliseconds for the embedded ZooKeeper server will allow the client to negotiate. Defaults to 20 times the tickTime | int | 1.2.0 |
-| kyuubi.zookeeper.embedded.min.session.timeout | 6000 | minSessionTimeout in milliseconds for the embedded ZooKeeper server will allow the client to negotiate. Defaults to 2 times the tickTime | int | 1.2.0 |
-| kyuubi.zookeeper.embedded.port | 2181 | The port of the embedded ZooKeeper server | int | 1.0.0 |
-| kyuubi.zookeeper.embedded.tick.time | 3000 | tickTime in milliseconds for the embedded ZooKeeper server | int | 1.2.0 |
+| Key | Default | Meaning | Type | Since |
+|--------------------------------------------------|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|-------|
+| kyuubi.zookeeper.embedded.client.port | 2181 | clientPort for the embedded ZooKeeper server to listen for client connections, a client here could be Kyuubi server, engine, and JDBC client | int | 1.2.0 |
+| kyuubi.zookeeper.embedded.client.port.address | <undefined> | clientPortAddress for the embedded ZooKeeper server to | string | 1.2.0 |
+| kyuubi.zookeeper.embedded.client.use.hostname | false | When true, embedded Zookeeper prefer to bind hostname, otherwise, ip address. | boolean | 1.7.2 |
+| kyuubi.zookeeper.embedded.data.dir | embedded_zookeeper | dataDir for the embedded zookeeper server where stores the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database. | string | 1.2.0 |
+| kyuubi.zookeeper.embedded.data.log.dir | embedded_zookeeper | dataLogDir for the embedded ZooKeeper server where writes the transaction log . | string | 1.2.0 |
+| kyuubi.zookeeper.embedded.directory | embedded_zookeeper | The temporary directory for the embedded ZooKeeper server | string | 1.0.0 |
+| kyuubi.zookeeper.embedded.max.client.connections | 120 | maxClientCnxns for the embedded ZooKeeper server to limit the number of concurrent connections of a single client identified by IP address | int | 1.2.0 |
+| kyuubi.zookeeper.embedded.max.session.timeout | 60000 | maxSessionTimeout in milliseconds for the embedded ZooKeeper server will allow the client to negotiate. Defaults to 20 times the tickTime | int | 1.2.0 |
+| kyuubi.zookeeper.embedded.min.session.timeout | 6000 | minSessionTimeout in milliseconds for the embedded ZooKeeper server will allow the client to negotiate. Defaults to 2 times the tickTime | int | 1.2.0 |
+| kyuubi.zookeeper.embedded.port | 2181 | The port of the embedded ZooKeeper server | int | 1.0.0 |
+| kyuubi.zookeeper.embedded.tick.time | 3000 | tickTime in milliseconds for the embedded ZooKeeper server | int | 1.2.0 |
## Spark Configurations
diff --git a/docs/develop_tools/building.md b/docs/contributing/code/building.md
similarity index 93%
rename from docs/develop_tools/building.md
rename to docs/contributing/code/building.md
index d4582dc8dae..8c5c5aeec60 100644
--- a/docs/develop_tools/building.md
+++ b/docs/contributing/code/building.md
@@ -15,9 +15,9 @@
- limitations under the License.
-->
-# Building Kyuubi
+# Building From Source
-## Building Kyuubi with Apache Maven
+## Building With Maven
**Kyuubi** is built based on [Apache Maven](https://maven.apache.org),
@@ -33,7 +33,7 @@ If you want to test it manually, you can start Kyuubi directly from the Kyuubi p
bin/kyuubi start
```
-## Building a Submodule Individually
+## Building A Submodule Individually
For instance, you can build the Kyuubi Common module using:
@@ -49,7 +49,7 @@ For instance, you can build the Kyuubi Common module using:
build/mvn clean package -pl kyuubi-common,kyuubi-ha -DskipTests
```
-## Skipping Some modules
+## Skipping Some Modules
For instance, you can build the Kyuubi modules without Kyuubi Codecov and Assembly modules using:
@@ -57,7 +57,7 @@ For instance, you can build the Kyuubi modules without Kyuubi Codecov and Assemb
mvn clean install -pl '!dev/kyuubi-codecov,!kyuubi-assembly' -DskipTests
```
-## Building Kyuubi against Different Apache Spark versions
+## Building Kyuubi Against Different Apache Spark Versions
Since v1.1.0, Kyuubi support building with different Spark profiles,
@@ -67,7 +67,7 @@ Since v1.1.0, Kyuubi support building with different Spark profiles,
| -Pspark-3.2 | No | 1.4.0 |
| -Pspark-3.3 | Yes | 1.6.0 |
-## Building with Apache dlcdn site
+## Building With Apache dlcdn Site
By default, we use `https://archive.apache.org/dist/` to download the built-in release packages of engines,
such as Spark or Flink.
diff --git a/docs/develop_tools/debugging.md b/docs/contributing/code/debugging.md
similarity index 98%
rename from docs/develop_tools/debugging.md
rename to docs/contributing/code/debugging.md
index faf7173e427..d3fb6d16f38 100644
--- a/docs/develop_tools/debugging.md
+++ b/docs/contributing/code/debugging.md
@@ -35,7 +35,7 @@ In the IDE, you set the corresponding parameters(host&port) in debug configurati
diff --git a/docs/develop_tools/developer.md b/docs/contributing/code/developer.md
similarity index 76%
rename from docs/develop_tools/developer.md
rename to docs/contributing/code/developer.md
index 329e219de46..ef6fb79889e 100644
--- a/docs/develop_tools/developer.md
+++ b/docs/contributing/code/developer.md
@@ -24,16 +24,6 @@
build/mvn versions:set -DgenerateBackupPoms=false
```
-## Update Document Version
-
-Whenever project version updates, please also update the document version at `docs/conf.py` to target the upcoming release.
-
-For example,
-
-```python
-release = '1.2.0'
-```
-
## Update Dependency List
Kyuubi uses the `dev/dependencyList` file to indicate what upstream dependencies will actually go to the server-side classpath.
@@ -58,3 +48,12 @@ Kyuubi uses settings.md to explain available configurations.
You can run `KYUUBI_UPDATE=1 build/mvn clean test -pl kyuubi-server -am -Pflink-provided,spark-provided,hive-provided -DwildcardSuites=org.apache.kyuubi.config.AllKyuubiConfiguration`
to append descriptions of new configurations to settings.md.
+
+## Generative Tooling Usage
+
+In general, the ASF allows contributions co-authored using generative AI tools. However, there are several considerations when you submit a patch containing generated content.
+
+Foremost, you are required to disclose usage of such tool. Furthermore, you are responsible for ensuring that the terms and conditions of the tool in question are
+compatible with usage in an Open Source project and inclusion of the generated content doesn't pose a risk of copyright violation.
+
+Please refer to [The ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for more detailed information.
diff --git a/docs/develop_tools/distribution.md b/docs/contributing/code/distribution.md
similarity index 98%
rename from docs/develop_tools/distribution.md
rename to docs/contributing/code/distribution.md
index 217f0a4178d..23c9c6542de 100644
--- a/docs/develop_tools/distribution.md
+++ b/docs/contributing/code/distribution.md
@@ -15,7 +15,7 @@
- limitations under the License.
-->
-# Building a Runnable Distribution
+# Building A Runnable Distribution
To create a Kyuubi distribution like those distributed by [Kyuubi Release Page](https://kyuubi.apache.org/releases.html),
and that is laid out to be runnable, use `./build/dist` in the project root directory.
diff --git a/docs/contributing/code/get_started.rst b/docs/contributing/code/get_started.rst
new file mode 100644
index 00000000000..33981a8cd6d
--- /dev/null
+++ b/docs/contributing/code/get_started.rst
@@ -0,0 +1,70 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+Get Started
+===========
+
+Good First Issues
+-----------------
+
+.. image:: https://img.shields.io/github/issues/apache/kyuubi/good%20first%20issue?color=green&label=Good%20first%20issue&logo=gfi&logoColor=red&style=for-the-badge
+ :alt: GitHub issues by-label
+ :target: `Good First Issues`_
+
+**Good First Issue** is initiative to curate easy pickings for first-time
+contributors. It helps you locate suitable development tasks with beginner's
+skills required, and finally make your first contribution to Kyuubi.
+
+After solving one or more good first issues, you should be able to
+
+- Find efficient ways to communicate with the community and get help
+- Setup `develop environment`_ on your machine
+- `Build`_ Kyuubi from source
+- `Run tests`_ locally
+- `Submit a pull request`_ through Github
+- Be listed in `Apache Kyuubi contributors`_
+- And most importantly, you can move to the next level and try some tricky issues
+
+.. note:: Don't linger too long at this stage.
+ :class: dropdown, toggle
+
+Help Wanted Issues
+------------------
+
+.. image:: https://img.shields.io/github/issues/apache/kyuubi/help%20wanted?color=brightgreen&label=HELP%20WANTED&style=for-the-badge
+ :alt: GitHub issues by-label
+ :target: `Help Wanted Issues`_
+
+Issues that maintainers labeled as help wanted are mostly
+
+- sub-tasks of an ongoing shorthanded umbrella
+- non-urgent improvements
+- bug fixes for corner cases
+- feature requests not covered by current technology stack of kyuubi community
+
+Since these problems are not urgent, you can take your time when fixing them.
+
+.. note:: Help wanted issues may contain easy pickings and tricky ones.
+ :class: dropdown, toggle
+
+
+.. _Good First Issues: https://github.com/apache/kyuubi/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22
+.. _develop environment: idea_setup.html
+.. _Build: build.html
+.. _Run tests: testing.html
+.. _Submit a pull request: https://kyuubi.apache.org/pull_request.html
+.. _Apache Kyuubi contributors: https://github.com/apache/kyuubi/graphs/contributors
+.. _Help Wanted Issues: https://github.com/apache/kyuubi/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22
+
diff --git a/docs/develop_tools/idea_setup.md b/docs/contributing/code/idea_setup.md
similarity index 100%
rename from docs/develop_tools/idea_setup.md
rename to docs/contributing/code/idea_setup.md
diff --git a/docs/develop_tools/index.rst b/docs/contributing/code/index.rst
similarity index 84%
rename from docs/develop_tools/index.rst
rename to docs/contributing/code/index.rst
index c56321cb379..25a6e421baa 100644
--- a/docs/develop_tools/index.rst
+++ b/docs/contributing/code/index.rst
@@ -13,15 +13,19 @@
See the License for the specific language governing permissions and
limitations under the License.
-Develop Tools
-=============
+Contributing Code
+=================
+
+These sections explain the process, guidelines, and tools for contributing
+code to the Kyuubi project.
.. toctree::
:maxdepth: 2
+ get_started
+ style
building
distribution
- build_document
testing
debugging
developer
diff --git a/docs/contributing/code/style.rst b/docs/contributing/code/style.rst
new file mode 100644
index 00000000000..d967e895971
--- /dev/null
+++ b/docs/contributing/code/style.rst
@@ -0,0 +1,39 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+Code Style Guide
+================
+
+Code is written once by its author, but read and modified multiple times by
+lots of other engineers. As most bugs actually come from future modification
+of the code, we need to optimize our codebase for long-term, global
+readability and maintainability. The best way to achieve this is to write
+simple code.
+
+Kyuubi's source code is multilingual, specific code style will be applied to
+corresponding language.
+
+Scala Coding Style Guide
+------------------------
+
+Kyuubi adopts the `Databricks Scala Coding Style Guide`_ for scala codes.
+
+Java Coding Style Guide
+-----------------------
+
+Kyuubi adopts the `Google Java style`_ for java codes.
+
+.. _Databricks Scala Coding Style Guide: https://github.com/databricks/scala-style-guide
+.. _Google Java style: https://google.github.io/styleguide/javaguide.html
\ No newline at end of file
diff --git a/docs/develop_tools/testing.md b/docs/contributing/code/testing.md
similarity index 100%
rename from docs/develop_tools/testing.md
rename to docs/contributing/code/testing.md
diff --git a/docs/contributing/doc/build.rst b/docs/contributing/doc/build.rst
new file mode 100644
index 00000000000..4ec2362f350
--- /dev/null
+++ b/docs/contributing/doc/build.rst
@@ -0,0 +1,96 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+Building Documentation
+======================
+
+Follow the steps below and learn how to build the Kyuubi documentation as the
+one you are watching now.
+
+Setup Environment
+-----------------
+
+- Firstly, install ``virtualenv``, this is optional but recommended as it is useful
+ to create an independent environment to resolve dependency issues for building
+ the documentation.
+
+.. code-block:: sh
+ :caption: Install virtualenv
+
+ $ pip install virtualenv
+
+- Switch to the ``docs`` root directory.
+
+.. code-block:: sh
+ :caption: Switch to docs
+
+ $ cd $KYUUBI_SOURCE_PATH/docs
+
+- Create a virtual environment named 'kyuubi' or anything you like using ``virtualenv``
+ if it's not existing.
+
+.. code-block:: sh
+ :caption: New virtual environment
+
+ $ virtualenv kyuubi
+
+- Activate the virtual environment,
+
+.. code-block:: sh
+ :caption: Activate virtual environment
+
+ $ source ./kyuubi/bin/activate
+
+Install All Dependencies
+------------------------
+
+Install all dependencies enumerated in the ``requirements.txt``.
+
+.. code-block:: sh
+ :caption: Install dependencies
+
+ $ pip install -r requirements.txt
+
+
+Create Documentation
+--------------------
+
+Make sure you are in the ``$KYUUBI_SOURCE_PATH/docs`` directory.
+
+Linux & MacOS
+~~~~~~~~~~~~~
+
+.. code-block:: sh
+ :caption: Sphinx build on Unix-like OS
+
+ $ make html
+
+Windows
+~~~~~~~
+
+.. code-block:: sh
+ :caption: Sphinx build on Windows
+
+ $ make.bat html
+
+
+If the build process succeed, the HTML pages are in
+``$KYUUBI_SOURCE_PATH/docs/_build/html``.
+
+View Locally
+------------
+
+Open the `$KYUUBI_SOURCE_PATH/docs/_build/html/index.html` file in your
+favorite web browser.
diff --git a/docs/contributing/doc/get_started.rst b/docs/contributing/doc/get_started.rst
new file mode 100644
index 00000000000..f262695b777
--- /dev/null
+++ b/docs/contributing/doc/get_started.rst
@@ -0,0 +1,117 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+Get Started
+===========
+
+.. image:: https://img.shields.io/github/issues/apache/kyuubi/kind:documentation?color=green&logo=gfi&logoColor=red&style=for-the-badge
+ :alt: GitHub issues by-label
+
+
+Trivial Fixes
+-------------
+
+For typos, layout, grammar, spelling, punctuation errors and other similar issues
+or changes that occur within a single file, it is acceptable to make edits directly
+on the page being viewed. When viewing a source file on kyuubi's
+`Github repository`_, a simple click on the ``edit icon`` or keyboard shortcut
+``e`` will activate the editor. Similarly, when viewing files on `Read The Docs`_
+platform, clicking on the ``suggest edit`` button will lead you to the editor.
+These methods do not require any local development environment setup and
+are convenient for making quick fixes.
+
+Upon completion of the editing process, opt the ``commit changes`` option,
+adhere to the provided instructions to submit a pull request,
+and await feedback from the designated reviewer.
+
+Major Fixes
+-----------
+
+For significant modifications that affect multiple files, it is advisable to
+clone the repository to a local development environment, implement the necessary
+changes, and conduct thorough testing prior to submitting a pull request.
+
+
+`Fork`_ The Repository
+~~~~~~~~~~~~~~~~~~~~~~
+
+Clone The Forked Repository
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block::
+ :caption: Clone the repository
+
+ $ git clone https://github.com/your_username/kyuubi.git
+
+Replace "your_username" with your GitHub username. This will create a local
+copy of your forked repository on your machine. You will see the ``master``
+branch if you run ``git branch`` in the ``kyuubi`` folder.
+
+Create A New Branch
+~~~~~~~~~~~~~~~~~~~
+
+.. code-block::
+ :caption: Create a new branch
+
+ $ git checkout -b guide
+ Switched to a new branch 'guide'
+
+Editing And Testing
+~~~~~~~~~~~~~~~~~~~
+
+Make the necessary changes to the documentation files using a text editor.
+`Build and verify`_ the changes you have made to see if they look fine.
+
+.. note::
+ :class: dropdown, toggle
+
+Create A Pull Request
+~~~~~~~~~~~~~~~~~~~~~
+
+Once you have made the changes,
+
+- Commit them with a descriptive commit message using the command:
+
+.. code-block::
+ :caption: commit the changes
+
+ $ git commit -m "Description of changes made"
+
+- Push the changes to your forked repository using the command
+
+.. code-block::
+ :caption: push the changes
+
+ $ git push origin guide
+
+- `Create A Pull Request`_ with a descriptive PR title and description.
+
+- Polishing the PR with comments of reviews addressed
+
+Report Only
+-----------
+
+If you don't have time to fix the doc issue and submit a pull request on your own,
+`reporting a document issue`_ also helps. Please follow some basic rules:
+
+- Use the title field to clearly describe the issue
+- Choose the documentation report template
+- Fill out the required field in the documentation report
+
+.. _Home Page: https://kyuubi.apache.org
+.. _Fork: https://github.com/apache/kyuubi/fork
+.. _Build and verify: build.html
+.. _Create A Pull Request: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request
+.. _reporting a document issue: https://github.com/apache/kyuubi/issues/new/choose
\ No newline at end of file
diff --git a/docs/contributing/doc/index.rst b/docs/contributing/doc/index.rst
new file mode 100644
index 00000000000..bf6ae41bde2
--- /dev/null
+++ b/docs/contributing/doc/index.rst
@@ -0,0 +1,44 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+Contributing Documentations
+===========================
+
+The project documentation is crucial for users and contributors. This guide
+outlines the contribution guidelines for Apache Kyuubi documentation.
+
+Kyuubi's documentation source files are maintained in the same `github repository`_
+as the code base, which ensures updating code and documentation synchronously.
+All documentation source files can be found in the sub-folder named ``docs``.
+
+Kyuubi's documentation is published and hosted on `Read The Docs`_ platform by
+version. with each version having its own dedicated page. To access a specific
+version of the document, simply navigate to the "Docs" tab on our Home Page.
+
+We welcome any contributions to the documentation, including but not limited to
+writing, translation, report doc issues on Github, reposting.
+
+
+.. toctree::
+ :maxdepth: 2
+
+ get_started
+ style
+ build
+
+.. _Github repository: https://github.com/apache/kyuubi
+.. _Restructured Text: https://en.wikipedia.org/wiki/ReStructuredText
+.. _Read The Docs: https://kyuubi.rtfd.io
+.. _Home Page: https://kyuubi.apache.org
\ No newline at end of file
diff --git a/docs/contributing/doc/style.rst b/docs/contributing/doc/style.rst
new file mode 100644
index 00000000000..14cc2b8ac78
--- /dev/null
+++ b/docs/contributing/doc/style.rst
@@ -0,0 +1,135 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+Documentation Style Guide
+=========================
+
+This guide contains guidelines, not rules. While guidelines are important
+to follow, they are not hard and fast rules. It's important to use your
+own judgement and discretion when creating content, and to depart from the
+guidelines when necessary to improve the quality and effectiveness of your
+content. Ultimately, the goal is to create content that is clear, concise,
+and useful to your audience, and sometimes deviating from the guidelines
+may be necessary to achieve that goal.
+
+Goals
+-----
+
+- Source text files are readable and portable
+- Source diagram files are editable
+- Source files are maintainable over time and across community
+
+License Header
+--------------
+
+All original documents should include the ASF license header. All reproduced
+or quoted content should be authorized and attributed to the source.
+
+If you are about to quote some from commercial materials, please refer to
+`ASF 3RD PARTY LICENSE POLICY`_, or consult the Apache Kyuubi PMC to avoid
+legality issues.
+
+General Style
+-------------
+
+- Use `ReStructuredText`_ or `Markdown`_ format for text, avoid HTML hacks
+- Use `draw.io`_ for drawing or editing an image, and export it as PNG for
+ referencing in document. A pull request should commit both of them
+- Use Kyuubi for short instead of Apache Kyuubi after the first time in the
+ same page
+- Character line limit: 78, except unbreakable ones
+- Prefer lists to tables
+- Prefer unordered list than ordered
+
+ReStructuredText
+----------------
+
+Headings
+~~~~~~~~
+
+- Use **Pascal Case**, every word starts with an uppercase letter,
+ e.g., 'Documentation Style Guide'
+- Use a max of **three levels**
+ - Split into multiple files when there comes an H4
+ - Prefer `directive rubric`_ than H4
+- Use underline-only adornment styles, **DO NOT** use overline
+ - The length of underline characters **SHOULD** match the title
+ - H1 should be underlined with '='
+ - H2 should be underlined with '-'
+ - H3 should be underlined with '~'
+ - H4 should be underlined with '^', but it's better to avoid using H4
+- **DO NOT** use numbering for sections
+- **DO NOT** use "Kyuubi" in titles if possible
+
+Links
+~~~~~
+
+- Define links with short descriptive phrases, group them at the bottom of the file
+
+.. note::
+ :class: dropdown, toggle
+
+ .. code-block::
+ :caption: Recommended
+
+ Please refer to `Apache Kyuubi Home Page`_.
+
+ .. _Apache Kyuubi Home Page: https://kyuubi.apache.org/
+
+ .. code-block::
+ :caption: Not recommended
+
+ Please refer to `Apache Kyuubi Home Page `_.
+
+
+Markdown
+--------
+
+Headings
+~~~~~~~~
+
+- Use **Pascal Case**, every word starts with an uppercase letter,
+ e.g., 'Documentation Style Guide'
+- Use a max of **three levels**
+ - Split into multiple files when there comes an H4
+- **DO NOT** use numbering for sections
+- **DO NOT** use "Kyuubi" in titles if possible
+
+Images
+------
+
+Use images only when they provide helpful visual explanations of information
+otherwise difficult to express with words
+
+Third-party references
+----------------------
+
+If the preceding references don't provide explicit guidance, then see these
+third-party references, depending on the nature of your question:
+
+- `Google developer documentation style`_
+- `Apple Style Guide`_
+- `Red Hat supplementary style guide for product documentation`_
+
+.. References
+
+.. _ASF 3RD PARTY LICENSE POLICY: https://www.apache.org/legal/resolved.html#asf-3rd-party-license-policy
+.. _directive rubric :https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-rubric
+.. _ReStructuredText: https://docutils.sourceforge.io/rst.html
+.. _Markdown: https://en.wikipedia.org/wiki/Markdown
+.. _draw.io: https://www.diagrams.net/
+.. _Google developer documentation style: https://developers.google.com/style
+.. _Apple Style Guide: https://help.apple.com/applestyleguide/
+.. _Red Hat supplementary style guide for product documentation: https://redhat-documentation.github.io/supplementary-style-guide/
diff --git a/docs/deployment/engine_on_kubernetes.md b/docs/deployment/engine_on_kubernetes.md
index 44fca1602e3..a8f7c6ca0e7 100644
--- a/docs/deployment/engine_on_kubernetes.md
+++ b/docs/deployment/engine_on_kubernetes.md
@@ -36,6 +36,17 @@ Spark on Kubernetes config master by using a special format.
You can use cmd `kubectl cluster-info` to get api-server host and port.
+### Deploy Mode
+
+One of the main advantages of the Kyuubi server compared to other interactive Spark clients is that it supports cluster deploy mode.
+It is highly recommended to run Spark in k8s in cluster mode.
+
+The minimum required configurations are:
+
+* spark.submit.deployMode (cluster)
+* spark.kubernetes.file.upload.path (path on s3 or hdfs)
+* spark.kubernetes.authenticate.driver.serviceAccountName ([viz ServiceAccount](#serviceaccount))
+
### Docker Image
Spark ships a `./bin/docker-image-tool.sh` script to build and publish the Docker images for running Spark applications on Kubernetes.
diff --git a/docs/deployment/engine_on_yarn.md b/docs/deployment/engine_on_yarn.md
index 6812afa46db..1025418d9c4 100644
--- a/docs/deployment/engine_on_yarn.md
+++ b/docs/deployment/engine_on_yarn.md
@@ -15,13 +15,13 @@
- limitations under the License.
-->
-# Deploy Kyuubi engines on Yarn
+# Deploy Kyuubi engines on YARN
-## Deploy Kyuubi Spark Engine on Yarn
+## Deploy Kyuubi Spark Engine on YARN
### Requirements
-When you want to deploy Kyuubi's Spark SQL engines on YARN, you'd better have cognition upon the following things.
+To deploy Kyuubi's Spark SQL engines on YARN, you'd better have cognition upon the following things.
- Knowing the basics about [Running Spark on YARN](https://spark.apache.org/docs/latest/running-on-yarn.html)
- A binary distribution of Spark which is built with YARN support
@@ -113,11 +113,11 @@ so `spark.kerberos.keytab` and `spark.kerberos.principal` should not use now.
Instead, you can schedule a periodically `kinit` process via `crontab` task on the local machine that hosts Kyuubi server or simply use [Kyuubi Kinit](settings.html#kinit).
-## Deploy Kyuubi Flink Engine on Yarn
+## Deploy Kyuubi Flink Engine on YARN
### Requirements
-When you want to deploy Kyuubi's Flink SQL engines on YARN, you'd better have cognition upon the following things.
+To deploy Kyuubi's Flink SQL engines on YARN, you'd better have cognition upon the following things.
- Knowing the basics about [Running Flink on YARN](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/yarn)
- A binary distribution of Flink which is built with YARN support
@@ -127,13 +127,59 @@ When you want to deploy Kyuubi's Flink SQL engines on YARN, you'd better have co
- An active Object Storage cluster, e.g. [HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html), S3 and [Minio](https://min.io/) etc.
- Setup Hadoop client configurations at the machine the Kyuubi server locates
-### Yarn Session Mode
+### Flink Deployment Modes
+
+Currently, Flink supports two deployment modes on YARN: [YARN Application Mode](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/resource-providers/yarn/#application-mode) and [YARN Session Mode](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/resource-providers/yarn/#application-mode).
+
+- YARN Application Mode: In this mode, Kyuubi starts a dedicated Flink application cluster and runs the SQL engine on it.
+- YARN Session Mode: In this mode, Kyuubi starts the Flink SQL engine locally and connects to a running Flink YARN session cluster.
+
+As Kyuubi has to know the deployment mode before starting the SQL engine, it's required to specify the deployment mode in Kyuubi configuration.
+
+```properties
+# candidates: yarn-application, yarn-session
+flink.execution.target=yarn-application
+```
+
+### YARN Application Mode
+
+#### Flink Configurations
+
+Since the Flink SQL engine runs inside the JobManager, it's recommended to tune the resource configurations of the JobManager based on your workload.
+
+The related Flink configurations are listed below (see more details at [Flink Configuration](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#yarn)):
+
+| Name | Default | Meaning |
+|--------------------------------|---------|----------------------------------------------------------------------------------------|
+| yarn.appmaster.vcores | 1 | The number of virtual cores (vcores) used by the JobManager (YARN application master). |
+| jobmanager.memory.process.size | (none) | Total size of the memory of the JobManager process. |
+
+Note that Flink application mode doesn't support HA for multiple jobs as for now, this also applies to Kyuubi's Flink SQL engine. If JobManager fails and restarts, the submitted jobs would not be recovered and should be re-submitted.
+
+#### Environment
+
+Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the Hadoop client configurations directory, usually, `$HADOOP_HOME/etc/hadoop`.
+
+You could verify your setup by the following command:
+
+```bash
+# we assume to be in the root directory of
+# the unzipped Flink distribution
+
+# (0) export HADOOP_CLASSPATH
+export HADOOP_CLASSPATH=`hadoop classpath`
+
+# (1) submit a Flink job and ensure it runs successfully
+./bin/flink run -m yarn-cluster ./examples/streaming/WordCount.jar
+```
+
+### YARN Session Mode
#### Flink Configurations
```bash
execution.target: yarn-session
-# Yarn Session Cluster application id.
+# YARN Session Cluster application id.
yarn.application.id: application_00000000XX_00XX
```
@@ -194,23 +240,19 @@ To use Hadoop vanilla jars, please configure $KYUUBI_HOME/conf/kyuubi-env.sh as
$ echo "export FLINK_HADOOP_CLASSPATH=`hadoop classpath`" >> $KYUUBI_HOME/conf/kyuubi-env.sh
```
-### Deployment Modes Supported by Flink on YARN
-
-For experiment use, we recommend deploying Kyuubi Flink SQL engine in [Session Mode](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/yarn/#session-mode).
-At present, [Application Mode](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/yarn/#application-mode) and [Per-Job Mode (deprecated)](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/yarn/#per-job-mode-deprecated) are not supported for Flink engine.
-
### Kerberos
-As Kyuubi Flink SQL engine wraps the Flink SQL client that currently does not support [Flink Kerberos Configuration](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/config/#security-kerberos-login-keytab),
-so `security.kerberos.login.keytab` and `security.kerberos.login.principal` should not use now.
+With regard to YARN application mode, Kerberos is supported natively by Flink, see [Flink Kerberos Configuration](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/config/#security-kerberos-login-keytab) for details.
-Instead, you can schedule a periodically `kinit` process via `crontab` task on the local machine that hosts Kyuubi server or simply use [Kyuubi Kinit](settings.html#kinit).
+With regard to YARN session mode, `security.kerberos.login.keytab` and `security.kerberos.login.principal` are not effective, as Kyuubi Flink SQL engine mainly relies on Flink SQL client which currently does not support [Flink Kerberos Configuration](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/config/#security-kerberos-login-keytab),
+
+As a workaround, you can schedule a periodically `kinit` process via `crontab` task on the local machine that hosts Kyuubi server or simply use [Kyuubi Kinit](settings.html#kinit).
-## Deploy Kyuubi Hive Engine on Yarn
+## Deploy Kyuubi Hive Engine on YARN
### Requirements
-When you want to deploy Kyuubi's Hive SQL engines on YARN, you'd better have cognition upon the following things.
+To deploy Kyuubi's Hive SQL engines on YARN, you'd better have cognition upon the following things.
- Knowing the basics about [Running Hive on YARN](https://cwiki.apache.org/confluence/display/Hive/GettingStarted)
- A binary distribution of Hive
@@ -239,7 +281,7 @@ $ $HIVE_HOME/bin/beeline -u 'jdbc:hive2://localhost:10000/default'
0: jdbc:hive2://localhost:10000/default> INSERT INTO TABLE pokes VALUES (1, 'hello');
```
-If the `Hive SQL` passes and there is a job in Yarn Web UI, It indicates the hive environment is normal.
+If the `Hive SQL` passes and there is a job in YARN Web UI, it indicates the hive environment is good.
#### Required Environment Variable
diff --git a/docs/deployment/high_availability_guide.md b/docs/deployment/high_availability_guide.md
index 353e549ebba..51c87815765 100644
--- a/docs/deployment/high_availability_guide.md
+++ b/docs/deployment/high_availability_guide.md
@@ -39,7 +39,7 @@ Using multiple Kyuubi service units with load balancing instead of a single unit
- High concurrency
- By adding or removing Kyuubi server instances can easily scale up or down to meet the need of client requests.
- Upgrade smoothly
- - Kyuubi server supports stop gracefully. We could delete a `k.i.` but not stop it immediately.
+ - Kyuubi server supports stopping gracefully. We could delete a `k.i.` but not stop it immediately.
In this case, the `k.i.` will not take any new connection request but only operation requests from existing connections.
After all connection are released, it stops then.
- The dependencies of Kyuubi engines are free to change, such as bump up versions, modify configurations, add external jars, relocate to another engine home. Everything will be reloaded during start and stop.
diff --git a/docs/deployment/index.rst b/docs/deployment/index.rst
index ec3ece95145..1b6bf876678 100644
--- a/docs/deployment/index.rst
+++ b/docs/deployment/index.rst
@@ -31,15 +31,6 @@ Basics
high_availability_guide
migration-guide
-Configurations
---------------
-
-.. toctree::
- :maxdepth: 2
- :glob:
-
- settings
-
Engines
-------
diff --git a/docs/deployment/kyuubi_on_kubernetes.md b/docs/deployment/kyuubi_on_kubernetes.md
index 8bb1d88c3fe..11ffe8e4859 100644
--- a/docs/deployment/kyuubi_on_kubernetes.md
+++ b/docs/deployment/kyuubi_on_kubernetes.md
@@ -90,7 +90,7 @@ See more related details in [Using RBAC Authorization](https://kubernetes.io/doc
## Config
-You can configure Kyuubi the old-fashioned way by placing kyuubi-default.conf inside the image. Kyuubi do not recommend using this way on Kubernetes.
+You can configure Kyuubi the old-fashioned way by placing `kyuubi-defaults.conf` inside the image. Kyuubi does not recommend using this way on Kubernetes.
Kyuubi provide `${KYUUBI_HOME}/docker/kyuubi-configmap.yaml` to build Configmap for Kyuubi.
diff --git a/docs/deployment/migration-guide.md b/docs/deployment/migration-guide.md
index fc916048c43..27dad2aba92 100644
--- a/docs/deployment/migration-guide.md
+++ b/docs/deployment/migration-guide.md
@@ -17,6 +17,17 @@
# Kyuubi Migration Guide
+## Upgrading from Kyuubi 1.7 to 1.8
+
+* Since Kyuubi 1.8, SQLite is added and becomes the default database type of Kyuubi metastore, as Derby has been deprecated.
+ Both Derby and SQLite are mainly for testing purposes, and they're not supposed to be used in production.
+ To restore previous behavior, set `kyuubi.metadata.store.jdbc.database.type=DERBY` and
+ `kyuubi.metadata.store.jdbc.url=jdbc:derby:memory:kyuubi_state_store_db;create=true`.
+
+## Upgrading from Kyuubi 1.7.1 to 1.7.2
+
+* Since Kyuubi 1.7.2, for Kyuubi BeeLine, please use `--python-mode` option to run python code or script.
+
## Upgrading from Kyuubi 1.7.0 to 1.7.1
* Since Kyuubi 1.7.1, `protocolVersion` is removed from the request parameters of the REST API `Open(create) a session`. All removed or unknown parameters will be silently ignored and affects nothing.
diff --git a/docs/deployment/spark/aqe.md b/docs/deployment/spark/aqe.md
index 90cc5aff84c..3682c7f9ec5 100644
--- a/docs/deployment/spark/aqe.md
+++ b/docs/deployment/spark/aqe.md
@@ -210,7 +210,7 @@ Kyuubi is a long-running service to make it easier for end-users to use Spark SQ
### Setting Default Configurations
-[Configuring by `spark-defaults.conf`](settings.html#via-spark-defaults-conf) at the engine side is the best way to set up Kyuubi with AQE. All engines will be instantiated with AQE enabled.
+[Configuring by `spark-defaults.conf`](../settings.html#via-spark-defaults-conf) at the engine side is the best way to set up Kyuubi with AQE. All engines will be instantiated with AQE enabled.
Here is a config setting that we use in our platform when deploying Kyuubi.
diff --git a/docs/deployment/spark/dynamic_allocation.md b/docs/deployment/spark/dynamic_allocation.md
index b177b63c365..1a5057e731f 100644
--- a/docs/deployment/spark/dynamic_allocation.md
+++ b/docs/deployment/spark/dynamic_allocation.md
@@ -170,7 +170,7 @@ Kyuubi is a long-running service to make it easier for end-users to use Spark SQ
### Setting Default Configurations
-[Configuring by `spark-defaults.conf`](settings.html#via-spark-defaults-conf) at the engine side is the best way to set up Kyuubi with DRA. All engines will be instantiated with DRA enabled.
+[Configuring by `spark-defaults.conf`](../settings.html#via-spark-defaults-conf) at the engine side is the best way to set up Kyuubi with DRA. All engines will be instantiated with DRA enabled.
Here is a config setting that we use in our platform when deploying Kyuubi.
diff --git a/docs/develop_tools/build_document.md b/docs/develop_tools/build_document.md
deleted file mode 100644
index 0be5a180705..00000000000
--- a/docs/develop_tools/build_document.md
+++ /dev/null
@@ -1,76 +0,0 @@
-
-
-# Building Kyuubi Documentation
-
-Follow the steps below and learn how to build the Kyuubi documentation as the one you are watching now.
-
-## Install & Activate `virtualenv`
-
-Firstly, install `virtualenv`, this is optional but recommended as it is useful to create an independent environment to resolve dependency issues for building the documentation.
-
-```bash
-pip install virtualenv
-```
-
-Switch to the `docs` root directory.
-
-```bash
-cd $KYUUBI_SOURCE_PATH/docs
-```
-
-Create a virtual environment named 'kyuubi' or anything you like using `virtualenv` if it's not existing.
-
-```bash
-virtualenv kyuubi
-```
-
-Activate it,
-
-```bash
-source ./kyuubi/bin/activate
-```
-
-## Install all dependencies
-
-Install all dependencies enumerated in the `requirements.txt`.
-
-```bash
-pip install -r requirements.txt
-```
-
-## Create Documentation
-
-Make sure you are in the `$KYUUBI_SOURCE_PATH/docs` directory.
-
-linux & macos
-
-```bash
-make html
-```
-
-windows
-
-```bash
-make.bat html
-```
-
-If the build process succeed, the HTML pages are in `$KYUUBI_SOURCE_PATH/docs/_build/html`.
-
-## View Locally
-
-Open the `$KYUUBI_SOURCE_PATH/docs/_build/html/index.html` file in your favorite web browser.
diff --git a/docs/extensions/engines/flink/functions.md b/docs/extensions/engines/flink/functions.md
new file mode 100644
index 00000000000..1d047d07889
--- /dev/null
+++ b/docs/extensions/engines/flink/functions.md
@@ -0,0 +1,30 @@
+
+
+# Auxiliary SQL Functions
+
+Kyuubi provides several auxiliary SQL functions as supplement to
+Flink's [Built-in Functions](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/systemfunctions/)
+
+| Name | Description | Return Type | Since |
+|---------------------|-------------------------------------------------------------|-------------|-------|
+| kyuubi_version | Return the version of Kyuubi Server | string | 1.8.0 |
+| kyuubi_engine_name | Return the application name for the associated query engine | string | 1.8.0 |
+| kyuubi_engine_id | Return the application id for the associated query engine | string | 1.8.0 |
+| kyuubi_system_user | Return the system user name for the associated query engine | string | 1.8.0 |
+| kyuubi_session_user | Return the session username for the associated query engine | string | 1.8.0 |
+
diff --git a/docs/extensions/engines/flink/index.rst b/docs/extensions/engines/flink/index.rst
index 01bbecf9263..58105b0fa76 100644
--- a/docs/extensions/engines/flink/index.rst
+++ b/docs/extensions/engines/flink/index.rst
@@ -20,6 +20,7 @@ Extensions for Flink
:maxdepth: 1
../../../connector/flink/index
+ functions
.. warning::
This page is still in-progress.
diff --git a/docs/extensions/engines/hive/functions.md b/docs/extensions/engines/hive/functions.md
new file mode 100644
index 00000000000..24094ecce31
--- /dev/null
+++ b/docs/extensions/engines/hive/functions.md
@@ -0,0 +1,30 @@
+
+
+
+# Auxiliary SQL Functions
+
+Kyuubi provides several auxiliary SQL functions as supplement to Hive's [Built-in Functions](https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-Built-inFunctions)
+
+| Name | Description | Return Type | Since |
+|----------------|-------------------------------------|-------------|-------|
+| kyuubi_version | Return the version of Kyuubi Server | string | 1.8.0 |
+| engine_name | Return the name of engine | string | 1.8.0 |
+| engine_id | Return the id of engine | string | 1.8.0 |
+| system_user | Return the system user | string | 1.8.0 |
+| session_user | Return the session user | string | 1.8.0 |
+
diff --git a/docs/extensions/engines/hive/index.rst b/docs/extensions/engines/hive/index.rst
index 8aeebf1bc8b..f43ec11e0b1 100644
--- a/docs/extensions/engines/hive/index.rst
+++ b/docs/extensions/engines/hive/index.rst
@@ -20,6 +20,7 @@ Extensions for Hive
:maxdepth: 2
../../../connector/hive/index
+ functions
.. warning::
This page is still in-progress.
diff --git a/docs/extensions/engines/spark/functions.md b/docs/extensions/engines/spark/functions.md
index 66f22aea860..78c2692436f 100644
--- a/docs/extensions/engines/spark/functions.md
+++ b/docs/extensions/engines/spark/functions.md
@@ -27,4 +27,5 @@ Kyuubi provides several auxiliary SQL functions as supplement to Spark's [Built-
| engine_id | Return the spark application id for the associated query engine | string | 1.4.0 |
| system_user | Return the system user name for the associated query engine | string | 1.3.0 |
| session_user | Return the session username for the associated query engine | string | 1.4.0 |
+| engine_url | Return the engine url for the associated query engine | string | 1.8.0 |
diff --git a/docs/extensions/engines/spark/lineage.md b/docs/extensions/engines/spark/lineage.md
index cd38be4ba12..2dbb2a026d3 100644
--- a/docs/extensions/engines/spark/lineage.md
+++ b/docs/extensions/engines/spark/lineage.md
@@ -45,14 +45,14 @@ The lineage of this SQL:
```json
{
- "inputTables": ["default.test_table0"],
+ "inputTables": ["spark_catalog.default.test_table0"],
"outputTables": [],
"columnLineage": [{
"column": "col0",
- "originalColumns": ["default.test_table0.a"]
+ "originalColumns": ["spark_catalog.default.test_table0.a"]
}, {
"column": "col1",
- "originalColumns": ["default.test_table0.b"]
+ "originalColumns": ["spark_catalog.default.test_table0.b"]
}]
}
```
@@ -101,13 +101,12 @@ Kyuubi Spark Lineage Listener Extension is built using [Apache Maven](https://ma
To build it, `cd` to the root direct of kyuubi project and run:
```shell
-build/mvn clean package -pl :kyuubi-spark-lineage_2.12 -DskipTests
+build/mvn clean package -pl :kyuubi-spark-lineage_2.12 -am -DskipTests
```
After a while, if everything goes well, you will get the plugin finally in two parts:
- The main plugin jar, which is under `./extensions/spark/kyuubi-spark-lineage/target/kyuubi-spark-lineage_${scala.binary.version}-${project.version}.jar`
-- The least transitive dependencies needed, which are under `./extensions/spark/kyuubi-spark-lineage/target/scala-${scala.binary.version}/jars`
### Build against Different Apache Spark Versions
@@ -118,7 +117,7 @@ Sometimes, it may be incompatible with other Spark distributions, then you may n
For example,
```shell
-build/mvn clean package -pl :kyuubi-spark-lineage_2.12 -DskipTests -Dspark.version=3.1.2
+build/mvn clean package -pl :kyuubi-spark-lineage_2.12 -am -DskipTests -Dspark.version=3.1.2
```
The available `spark.version`s are shown in the following table.
@@ -126,6 +125,7 @@ The available `spark.version`s are shown in the following table.
| Spark Version | Supported | Remark |
|:-------------:|:---------:|:------:|
| master | √ | - |
+| 3.4.x | √ | - |
| 3.3.x | √ | - |
| 3.2.x | √ | - |
| 3.1.x | √ | - |
@@ -186,6 +186,7 @@ The lineage dispatchers are used to dispatch lineage events, configured via `spa
SPARK_EVENT (by default): send lineage event to spark event bus
KYUUBI_EVENT: send lineage event to kyuubi event bus
+
ATLAS: send lineage to apache atlas
#### Get Lineage Events from SparkListener
@@ -208,3 +209,24 @@ spark.sparkContext.addSparkListener(new SparkListener {
#### Get Lineage Events from Kyuubi EventHandler
When using the `KYUUBI_EVENT` dispatcher, the lineage events will be sent to the Kyuubi `EventBus`. Refer to [Kyuubi Event Handler](../../server/events) to handle kyuubi events.
+
+#### Ingest Lineage Entities to Apache Atlas
+
+The lineage entities can be ingested into [Apache Atlas](https://atlas.apache.org/) using the `ATLAS` dispatcher.
+
+Extra works:
+
++ The least transitive dependencies needed, which are under `./extensions/spark/kyuubi-spark-lineage/target/scala-${scala.binary.version}/jars`
++ Use `spark.files` to specify the `atlas-application.properties` configuration file for Atlas
+
+Atlas Client configurations (Configure in `atlas-application.properties` or passed in `spark.atlas.` prefix):
+
+| Name | Default Value | Description | Since |
+|-----------------------------------------|------------------------|-------------------------------------------------------|-------|
+| atlas.rest.address | http://localhost:21000 | The rest endpoint url for the Atlas server | 1.8.0 |
+| atlas.client.type | rest | The client type (currently only supports rest) | 1.8.0 |
+| atlas.client.username | none | The client username | 1.8.0 |
+| atlas.client.password | none | The client password | 1.8.0 |
+| atlas.cluster.name | primary | The cluster name to use in qualifiedName of entities. | 1.8.0 |
+| atlas.hook.spark.column.lineage.enabled | true | Whether to ingest column lineages to Atlas. | 1.8.0 |
+
diff --git a/docs/extensions/engines/spark/rules.md b/docs/extensions/engines/spark/rules.md
index a4bda5d53ff..4614f52440a 100644
--- a/docs/extensions/engines/spark/rules.md
+++ b/docs/extensions/engines/spark/rules.md
@@ -66,14 +66,15 @@ Kyuubi provides some configs to make these feature easy to use.
| Name | Default Value | Description | Since |
|---------------------------------------------------------------------|----------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
| spark.sql.optimizer.insertRepartitionBeforeWrite.enabled | true | Add repartition node at the top of query plan. An approach of merging small files. | 1.2.0 |
-| spark.sql.optimizer.insertRepartitionNum | none | The partition number if `spark.sql.optimizer.insertRepartitionBeforeWrite.enabled` is enabled. If AQE is disabled, the default value is `spark.sql.shuffle.partitions`. If AQE is enabled, the default value is none that means depend on AQE. | 1.2.0 |
+| spark.sql.optimizer.insertRepartitionNum | none | The partition number if `spark.sql.optimizer.insertRepartitionBeforeWrite.enabled` is enabled. If AQE is disabled, the default value is `spark.sql.shuffle.partitions`. If AQE is enabled, the default value is none that means depend on AQE. This config is used for Spark 3.1 only. | 1.2.0 |
| spark.sql.optimizer.dynamicPartitionInsertionRepartitionNum | 100 | The partition number of each dynamic partition if `spark.sql.optimizer.insertRepartitionBeforeWrite.enabled` is enabled. We will repartition by dynamic partition columns to reduce the small file but that can cause data skew. This config is to extend the partition of dynamic partition column to avoid skew but may generate some small files. | 1.2.0 |
| spark.sql.optimizer.forceShuffleBeforeJoin.enabled | false | Ensure shuffle node exists before shuffled join (shj and smj) to make AQE `OptimizeSkewedJoin` works (complex scenario join, multi table join). | 1.2.0 |
| spark.sql.optimizer.finalStageConfigIsolation.enabled | false | If true, the final stage support use different config with previous stage. The prefix of final stage config key should be `spark.sql.finalStage.`. For example, the raw spark config: `spark.sql.adaptive.advisoryPartitionSizeInBytes`, then the final stage config should be: `spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes`. | 1.2.0 |
| spark.sql.analyzer.classification.enabled | false | When true, allows Kyuubi engine to judge this SQL's classification and set `spark.sql.analyzer.classification` back into sessionConf. Through this configuration item, Spark can optimizing configuration dynamic. | 1.4.0 |
| spark.sql.optimizer.insertZorderBeforeWriting.enabled | true | When true, we will follow target table properties to insert zorder or not. The key properties are: 1) `kyuubi.zorder.enabled`: if this property is true, we will insert zorder before writing data. 2) `kyuubi.zorder.cols`: string split by comma, we will zorder by these cols. | 1.4.0 |
| spark.sql.optimizer.zorderGlobalSort.enabled | true | When true, we do a global sort using zorder. Note that, it can cause data skew issue if the zorder columns have less cardinality. When false, we only do local sort using zorder. | 1.4.0 |
-| spark.sql.watchdog.maxPartitions | none | Set the max partition number when spark scans a data source. Enable MaxPartitionStrategy by specifying this configuration. Add maxPartitions Strategy to avoid scan excessive partitions on partitioned table, it's optional that works with defined | 1.4.0 |
+| spark.sql.watchdog.maxPartitions | none | Set the max partition number when spark scans a data source. Enable maxPartition Strategy by specifying this configuration. Add maxPartitions Strategy to avoid scan excessive partitions on partitioned table, it's optional that works with defined | 1.4.0 |
+| spark.sql.watchdog.maxFileSize | none | Set the maximum size in bytes of files when spark scans a data source. Enable maxFileSize Strategy by specifying this configuration. Add maxFileSize Strategy to avoid scan excessive size of files, it's optional that works with defined | 1.8.0 |
| spark.sql.optimizer.dropIgnoreNonExistent | false | When true, do not report an error if DROP DATABASE/TABLE/VIEW/FUNCTION/PARTITION specifies a non-existent database/table/view/function/partition | 1.5.0 |
| spark.sql.optimizer.rebalanceBeforeZorder.enabled | false | When true, we do a rebalance before zorder in case data skew. Note that, if the insertion is dynamic partition we will use the partition columns to rebalance. Note that, this config only affects with Spark 3.3.x. | 1.6.0 |
| spark.sql.optimizer.rebalanceZorderColumns.enabled | false | When true and `spark.sql.optimizer.rebalanceBeforeZorder.enabled` is true, we do rebalance before Z-Order. If it's dynamic partition insert, the rebalance expression will include both partition columns and Z-Order columns. Note that, this config only affects with Spark 3.3.x. | 1.6.0 |
@@ -84,8 +85,9 @@ Kyuubi provides some configs to make these feature easy to use.
| spark.sql.optimizer.insertRepartitionBeforeWriteIfNoShuffle.enabled | false | When true, add repartition even if the original plan does not have shuffle. | 1.7.0 |
| spark.sql.optimizer.finalStageConfigIsolationWriteOnly.enabled | true | When true, only enable final stage isolation for writing. | 1.7.0 |
| spark.sql.finalWriteStage.eagerlyKillExecutors.enabled | false | When true, eagerly kill redundant executors before running final write stage. | 1.8.0 |
+| spark.sql.finalWriteStage.skipKillingExecutorsForTableCache | true | When true, skip killing executors if the plan has table caches. | 1.8.0 |
| spark.sql.finalWriteStage.retainExecutorsFactor | 1.2 | If the target executors * factor < active executors, and target executors * factor > min executors, then inject kill executors or inject custom resource profile. | 1.8.0 |
-| spark.sql.finalWriteStage.resourceIsolation.enabled | false | When true, make final write stage resource isolation using custom RDD resource profile. | 1.2.0 |
+| spark.sql.finalWriteStage.resourceIsolation.enabled | false | When true, make final write stage resource isolation using custom RDD resource profile. | 1.8.0 |
| spark.sql.finalWriteStageExecutorCores | fallback spark.executor.cores | Specify the executor core request for final write stage. It would be passed to the RDD resource profile. | 1.8.0 |
| spark.sql.finalWriteStageExecutorMemory | fallback spark.executor.memory | Specify the executor on heap memory request for final write stage. It would be passed to the RDD resource profile. | 1.8.0 |
| spark.sql.finalWriteStageExecutorMemoryOverhead | fallback spark.executor.memoryOverhead | Specify the executor memory overhead request for final write stage. It would be passed to the RDD resource profile. | 1.8.0 |
diff --git a/docs/extensions/server/authentication.rst b/docs/extensions/server/authentication.rst
index ab238040cda..7a83b07c285 100644
--- a/docs/extensions/server/authentication.rst
+++ b/docs/extensions/server/authentication.rst
@@ -49,12 +49,12 @@ To create custom Authenticator class derived from the above interface, we need t
- Referencing the library
-.. code-block:: xml
+.. parsed-literal::
org.apache.kyuubikyuubi-common_2.12
- 1.5.2-incubating
+ \ |release|\provided
diff --git a/docs/extensions/server/events.rst b/docs/extensions/server/events.rst
index 832c1e5df55..aee7d4899d2 100644
--- a/docs/extensions/server/events.rst
+++ b/docs/extensions/server/events.rst
@@ -51,12 +51,12 @@ To create custom EventHandlerProvider class derived from the above interface, we
- Referencing the library
-.. code-block:: xml
+.. parsed-literal::
org.apache.kyuubi
- kyuubi-event_2.12
- 1.7.0-incubating
+ kyuubi-events_2.12
+ \ |release|\provided
diff --git a/docs/index.rst b/docs/index.rst
index fbd299e7b86..e86041ffc0d 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -179,6 +179,7 @@ What's Next
:glob:
quick_start/index
+ configuration/settings
deployment/index
Security
monitor/index
@@ -216,7 +217,13 @@ What's Next
:caption: Contributing
:maxdepth: 2
- develop_tools/index
+ contributing/code/index
+ contributing/doc/index
+
+.. toctree::
+ :caption: Community
+ :maxdepth: 2
+
community/index
.. toctree::
diff --git a/docs/quick_start/quick_start.rst b/docs/quick_start/quick_start.rst
index db564edb92c..2cf5f567fcb 100644
--- a/docs/quick_start/quick_start.rst
+++ b/docs/quick_start/quick_start.rst
@@ -43,8 +43,8 @@ pre-installed and the `JAVA_HOME` is correctly set to each component.
**Kyuubi** Gateway \ |release| \ - Kyuubi Server
Engine lib - Kyuubi Engine
Beeline - Kyuubi Hive Beeline
- **Spark** Engine >=3.0.0 A Spark distribution
- **Flink** Engine >=1.14.0 A Flink distribution
+ **Spark** Engine >=3.1 A Spark distribution
+ **Flink** Engine 1.16/1.17 A Flink distribution
**Trino** Engine >=363 A Trino cluster
**Doris** Engine N/A A Doris cluster
**Hive** Engine - 3.1.x - A Hive distribution
diff --git a/docs/quick_start/quick_start_with_helm.md b/docs/quick_start/quick_start_with_helm.md
index a2de5444560..0733a4de72b 100644
--- a/docs/quick_start/quick_start_with_helm.md
+++ b/docs/quick_start/quick_start_with_helm.md
@@ -15,7 +15,7 @@
- limitations under the License.
-->
-# Getting Started With Kyuubi on Kubernetes
+# Getting Started with Helm
## Running Kyuubi with Helm
diff --git a/docs/quick_start/quick_start_with_jdbc.md b/docs/quick_start/quick_start_with_jdbc.md
index c22cc1b65c1..e6f4f705296 100644
--- a/docs/quick_start/quick_start_with_jdbc.md
+++ b/docs/quick_start/quick_start_with_jdbc.md
@@ -15,82 +15,82 @@
- limitations under the License.
-->
-# Getting Started With Hive JDBC
+# Getting Started with Hive JDBC
-## How to install JDBC driver
+## How to get the Kyuubi JDBC driver
-Kyuubi JDBC driver is fully compatible with the 2.3.* version of hive JDBC driver, so we reuse hive JDBC driver to connect to Kyuubi server.
+Kyuubi Thrift API is fully compatible with HiveServer2, so technically, it allows to use any Hive JDBC driver to connect
+Kyuubi Server. But it's recommended to use [Kyuubi Hive JDBC driver](../client/jdbc/kyuubi_jdbc), which is forked from
+Hive 3.1.x JDBC driver, aims to support some missing functionalities of the original Hive JDBC driver.
-Add repository to your maven configuration file which may reside in `$MAVEN_HOME/conf/settings.xml`.
+The driver is available from Maven Central:
```xml
-
-
- central maven repo
- central maven repo https
- https://repo.maven.apache.org/maven2
-
-
-```
-
-You can add below dependency to your `pom.xml` file in your application.
-
-```xml
-
-
- org.apache.hive
- hive-jdbc
- 2.3.7
-
- org.apache.hadoop
- hadoop-common
-
- 2.7.4
+ org.apache.kyuubi
+ kyuubi-hive-jdbc-shaded
+ 1.7.0
```
-## Use JDBC driver with kerberos
+## Connect to non-kerberized Kyuubi Server
The below java code is using a keytab file to login and connect to Kyuubi server by JDBC.
```java
package org.apache.kyuubi.examples;
-import java.io.IOException;
-import java.security.PrivilegedExceptionAction;
import java.sql.*;
-import org.apache.hadoop.security.UserGroupInformation;
-
-public class JDBCTest {
-
- private static String driverName = "org.apache.hive.jdbc.HiveDriver";
- private static String kyuubiJdbcUrl = "jdbc:hive2://localhost:10009/default;";
-
- public static void main(String[] args) throws ClassNotFoundException, SQLException {
- String principal = args[0]; // kerberos principal
- String keytab = args[1]; // keytab file location
- Configuration configuration = new Configuration();
- configuration.set(HADOOP_SECURITY_AUTHENTICATION, "kerberos");
- UserGroupInformation.setConfiguration(configuration);
- UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab);
-
- Class.forName(driverName);
- Connection conn = ugi.doAs(new PrivilegedExceptionAction(){
- public Connection run() throws SQLException {
- return DriverManager.getConnection(kyuubiJdbcUrl);
- }
- });
- Statement st = conn.createStatement();
- ResultSet res = st.executeQuery("show databases");
- while (res.next()) {
- System.out.println(res.getString(1));
+public class KyuubiJDBC {
+
+ private static String driverName = "org.apache.kyuubi.jdbc.KyuubiHiveDriver";
+ private static String kyuubiJdbcUrl = "jdbc:kyuubi://localhost:10009/default;";
+
+ public static void main(String[] args) throws SQLException {
+ try (Connection conn = DriverManager.getConnection(kyuubiJdbcUrl)) {
+ try (Statement stmt = conn.createStatement()) {
+ try (ResultSet rs = st.executeQuery("show databases")) {
+ while (rs.next()) {
+ System.out.println(rs.getString(1));
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+## Connect to Kerberized Kyuubi Server
+
+The following Java code uses a keytab file to login and connect to Kyuubi Server by JDBC.
+
+```java
+package org.apache.kyuubi.examples;
+
+import java.sql.*;
+
+public class KyuubiJDBCDemo {
+
+ private static String driverName = "org.apache.kyuubi.jdbc.KyuubiHiveDriver";
+ private static String kyuubiJdbcUrlTemplate = "jdbc:kyuubi://localhost:10009/default;" +
+ "kyuubiClientPrincipal=%s;kyuubiClientKeytab=%s;kyuubiServerPrincipal=%s";
+
+ public static void main(String[] args) throws SQLException {
+ String clientPrincipal = args[0]; // Kerberos principal
+ String clientKeytab = args[1]; // Keytab file location
+ String serverPrincipal = arg[2]; // Kerberos principal used by Kyuubi Server
+ String kyuubiJdbcUrl = String.format(kyuubiJdbcUrlTemplate, clientPrincipal, clientKeytab, serverPrincipal);
+ try (Connection conn = DriverManager.getConnection(kyuubiJdbcUrl)) {
+ try (Statement stmt = conn.createStatement()) {
+ try (ResultSet rs = st.executeQuery("show databases")) {
+ while (rs.next()) {
+ System.out.println(rs.getString(1));
+ }
}
- res.close();
- st.close();
- conn.close();
+ }
}
+ }
}
```
diff --git a/docs/quick_start/quick_start_with_jupyter.md b/docs/quick_start/quick_start_with_jupyter.md
index 44b3faa5786..608da92846e 100644
--- a/docs/quick_start/quick_start_with_jupyter.md
+++ b/docs/quick_start/quick_start_with_jupyter.md
@@ -15,5 +15,5 @@
- limitations under the License.
-->
-# Getting Started With Hive Jupyter Lap
+# Getting Started with Jupyter Lap
diff --git a/docs/requirements.txt b/docs/requirements.txt
index ecc8116e77d..8e1f5c47119 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -24,3 +24,5 @@ sphinx-book-theme==0.3.3
sphinx-markdown-tables==0.0.17
sphinx-notfound-page==0.8.3
sphinx-togglebutton===0.3.2
+sphinxemoji===0.2.0
+sphinx-copybutton===0.5.2
diff --git a/docs/security/authentication.rst b/docs/security/authentication.rst
index f16a452c8c2..00bf368ff11 100644
--- a/docs/security/authentication.rst
+++ b/docs/security/authentication.rst
@@ -43,4 +43,4 @@ The related configurations can be found at `Authentication Configurations`_
jdbc
../extensions/server/authentication
-.. _Authentication Configurations: ../deployment/settings.html#authentication
+.. _Authentication Configurations: ../configuration/settings.html#authentication
diff --git a/docs/security/authorization/spark/build.md b/docs/security/authorization/spark/build.md
index 3886f08dfa3..7e38f2eed19 100644
--- a/docs/security/authorization/spark/build.md
+++ b/docs/security/authorization/spark/build.md
@@ -68,17 +68,18 @@ build/mvn clean package -pl :kyuubi-spark-authz_2.12 -DskipTests -Dranger.versio
The available `ranger.version`s are shown in the following table.
-| Ranger Version | Supported | Remark |
-|:--------------:|:---------:|:------:|
-| 2.3.x | √ | - |
-| 2.2.x | √ | - |
-| 2.1.x | √ | - |
-| 2.0.x | √ | - |
-| 1.2.x | √ | - |
-| 1.1.x | √ | - |
-| 1.0.x | √ | - |
-| 0.7.x | √ | - |
-| 0.6.x | √ | - |
+| Ranger Version | Supported | Remark |
+|:--------------:|:---------:|:-----------------------------------------------------------------------------------------:|
+| 2.4.x | √ | - |
+| 2.3.x | √ | - |
+| 2.2.x | √ | - |
+| 2.1.x | √ | - |
+| 2.0.x | √ | - |
+| 1.2.x | √ | - |
+| 1.1.x | √ | - |
+| 1.0.x | √ | - |
+| 0.7.x | √ | - |
+| 0.6.x | X | [KYUUBI-4672](https://github.com/apache/kyuubi/issues/4672) reported unresolved failures. |
Currently, all ranger releases are supported.
diff --git a/docs/security/authorization/spark/overview.rst b/docs/security/authorization/spark/overview.rst
index fcbaa880b60..364d6485fe7 100644
--- a/docs/security/authorization/spark/overview.rst
+++ b/docs/security/authorization/spark/overview.rst
@@ -106,4 +106,4 @@ You can specify config `spark.kyuubi.conf.restricted.list` values to disable cha
2. A set statement with key equal to `spark.sql.optimizer.excludedRules` and value containing `org.apache.kyuubi.plugin.spark.authz.ranger.*` also does not allow modification.
.. _Apache Ranger: https://ranger.apache.org/
-.. _Spark Configurations: ../../../deployment/settings.html#spark-configurations
+.. _Spark Configurations: ../../../configuration/settings.html#spark-configurations
diff --git a/docs/security/ldap.md b/docs/security/ldap.md
new file mode 100644
index 00000000000..7994afb5142
--- /dev/null
+++ b/docs/security/ldap.md
@@ -0,0 +1,60 @@
+
+
+# Configure Kyuubi to use LDAP Authentication
+
+Kyuubi can be configured to enable frontend LDAP authentication for clients, such as the BeeLine, or the JDBC and ODBC drivers.
+At present, only simple LDAP authentication mechanism involving username and password is supported. The client sends
+a username and password to the Kyuubi server, and the Kyuubi server validates these credentials using an external LDAP service.
+
+## Enable LDAP Authentication
+
+To enable LDAP authentication for Kyuubi, LDAP-related configurations is required to be configured in
+`$KYUUBI_HOME/conf/kyuubi-defaults.conf` on each node where Kyuubi server is installed.
+
+For example,
+
+```properties example
+kyuubi.authentication=LDAP
+kyuubi.authentication.ldap.baseDN=dc=org
+kyuubi.authentication.ldap.domain=apache.org
+kyuubi.authentication.ldap.binddn=uid=kyuubi,OU=Users,DC=apache,DC=org
+kyuubi.authentication.ldap.bindpw=kyuubi123123
+kyuubi.authentication.ldap.url=ldap://hostname.com:389/
+```
+
+## User and Group Filter in LDAP
+
+Kyuubi also supports complex LDAP cases as [Apache Hive](https://cwiki.apache.org/confluence/display/Hive/User+and+Group+Filter+Support+with+LDAP+Atn+Provider+in+HiveServer2#UserandGroupFilterSupportwithLDAPAtnProviderinHiveServer2-UserandGroupFilterSupportwithLDAP) does.
+
+For example,
+
+```properties example
+# Group Membership
+kyuubi.authentication.ldap.groupClassKey=groupOfNames
+kyuubi.authentication.ldap.groupDNPattern=CN=%s,OU=Groups,DC=apache,DC=org
+kyuubi.authentication.ldap.groupFilter=group1,group2
+kyuubi.authentication.ldap.groupMembershipKey=memberUid
+# User Search List
+kyuubi.authentication.ldap.userDNPattern=CN=%s,CN=Users,DC=apache,DC=org
+kyuubi.authentication.ldap.userFilter=hive-admin,hive,hive-test,hive-user
+# Custom Query
+kyuubi.authentication.ldap.customLDAPQuery=(&(objectClass=group)(objectClass=top)(instanceType=4)(cn=Domain*)), (&(objectClass=person)(|(sAMAccountName=admin)(|(memberOf=CN=Domain Admins,CN=Users,DC=domain,DC=com)(memberOf=CN=Administrators,CN=Builtin,DC=domain,DC=com))))
+```
+
+Please refer to [Settings for LDAP authentication in Kyuubi](../configuration/settings.html?highlight=LDAP#authentication)
+for all configurations.
diff --git a/docs/security/ldap.rst b/docs/security/ldap.rst
deleted file mode 100644
index 35cfcd6decf..00000000000
--- a/docs/security/ldap.rst
+++ /dev/null
@@ -1,21 +0,0 @@
-.. Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
-.. http://www.apache.org/licenses/LICENSE-2.0
-
-.. Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-
-
-Configure Kyuubi to use LDAP Authentication
-===============================================
-
-.. warning::
- the page is still in-progress.
diff --git a/docs/tools/kyuubi-admin.rst b/docs/tools/kyuubi-admin.rst
index 6063965938c..29149e92f5f 100644
--- a/docs/tools/kyuubi-admin.rst
+++ b/docs/tools/kyuubi-admin.rst
@@ -73,6 +73,8 @@ Usage: ``bin/kyuubi-admin refresh config [options] []``
- The user defaults configs with key in format in the form of `___{username}___.{config key}` from default property file.
* - unlimitedUsers
- The users without maximum connections limitation.
+ * - denyUsers
+ - The user in the deny list will be denied to connect to kyuubi server.
.. _list_engine:
@@ -98,6 +100,15 @@ Usage: ``bin/kyuubi-admin list engine [options]``
* - --hs2ProxyUser
- The proxy user to impersonate. When specified, it will list engines for the hs2ProxyUser.
+.. _list_server:
+
+List Servers
+-------------------------------------
+
+Prints a table of the key information about the servers.
+
+Usage: ``bin/kyuubi-admin list server``
+
.. _delete_engine:
Delete an Engine
diff --git a/extensions/server/kyuubi-server-plugin/pom.xml b/extensions/server/kyuubi-server-plugin/pom.xml
index 799f27c4632..12c1699fc02 100644
--- a/extensions/server/kyuubi-server-plugin/pom.xml
+++ b/extensions/server/kyuubi-server-plugin/pom.xml
@@ -21,7 +21,7 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
diff --git a/extensions/spark/kyuubi-extension-spark-3-1/pom.xml b/extensions/spark/kyuubi-extension-spark-3-1/pom.xml
index 9f218f9d0fe..a7fcbabe5b4 100644
--- a/extensions/spark/kyuubi-extension-spark-3-1/pom.xml
+++ b/extensions/spark/kyuubi-extension-spark-3-1/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-extension-spark-3-1_2.12
+ kyuubi-extension-spark-3-1_${scala.binary.version}jarKyuubi Dev Spark Extensions (for Spark 3.1)https://kyuubi.apache.org/
@@ -125,10 +125,21 @@
jakarta.xml.bind-apitest
+
+
+ org.apache.logging.log4j
+ log4j-1.2-api
+ test
+
+
+
+ org.apache.logging.log4j
+ log4j-slf4j-impl
+ test
+
-
org.apache.maven.plugins
@@ -137,7 +148,7 @@
false
- org.apache.kyuubi:kyuubi-extension-spark-common_${scala.binary.version}
+ org.apache.kyuubi:*
diff --git a/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala b/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
index cd312de953b..f952b56f387 100644
--- a/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
@@ -20,7 +20,7 @@ package org.apache.kyuubi.sql
import org.apache.spark.sql.SparkSessionExtensions
import org.apache.kyuubi.sql.sqlclassification.KyuubiSqlClassification
-import org.apache.kyuubi.sql.watchdog.{ForcedMaxOutputRowsRule, MaxPartitionStrategy}
+import org.apache.kyuubi.sql.watchdog.{ForcedMaxOutputRowsRule, MaxScanStrategy}
// scalastyle:off line.size.limit
/**
@@ -40,6 +40,6 @@ class KyuubiSparkSQLExtension extends (SparkSessionExtensions => Unit) {
// watchdog extension
extensions.injectOptimizerRule(ForcedMaxOutputRowsRule)
- extensions.injectPlannerStrategy(MaxPartitionStrategy)
+ extensions.injectPlannerStrategy(MaxScanStrategy)
}
}
diff --git a/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala b/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
index 2f12a82e23e..87c10bc3467 100644
--- a/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
@@ -21,19 +21,21 @@ import org.antlr.v4.runtime._
import org.antlr.v4.runtime.atn.PredictionMode
import org.antlr.v4.runtime.misc.{Interval, ParseCancellationException}
import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.{FunctionIdentifier, TableIdentifier}
+import org.apache.spark.sql.catalyst.{FunctionIdentifier, SQLConfHelper, TableIdentifier}
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.parser.{ParseErrorListener, ParseException, ParserInterface, PostProcessor}
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.trees.Origin
import org.apache.spark.sql.types.{DataType, StructType}
-abstract class KyuubiSparkSQLParserBase extends ParserInterface {
+abstract class KyuubiSparkSQLParserBase extends ParserInterface with SQLConfHelper {
def delegate: ParserInterface
- def astBuilder: KyuubiSparkSQLAstBuilderBase
+ def astBuilder: KyuubiSparkSQLAstBuilder
override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>
astBuilder.visit(parser.singleStatement()) match {
+ case optimize: UnparsedPredicateOptimize =>
+ astBuilder.buildOptimizeStatement(optimize, delegate.parseExpression)
case plan: LogicalPlan => plan
case _ => delegate.parsePlan(sqlText)
}
@@ -105,7 +107,7 @@ abstract class KyuubiSparkSQLParserBase extends ParserInterface {
class SparkKyuubiSparkSQLParser(
override val delegate: ParserInterface)
extends KyuubiSparkSQLParserBase {
- def astBuilder: KyuubiSparkSQLAstBuilderBase = new KyuubiSparkSQLAstBuilder
+ def astBuilder: KyuubiSparkSQLAstBuilder = new KyuubiSparkSQLAstBuilder
}
/* Copied from Apache Spark's to avoid dependency on Spark Internals */
diff --git a/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/sqlclassification/KyuubiGetSqlClassification.scala b/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/sqlclassification/KyuubiGetSqlClassification.scala
index e8aadc85029..b94cdf34674 100644
--- a/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/sqlclassification/KyuubiGetSqlClassification.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-1/src/main/scala/org/apache/kyuubi/sql/sqlclassification/KyuubiGetSqlClassification.scala
@@ -55,7 +55,7 @@ object KyuubiGetSqlClassification extends Logging {
* You need to make sure that the configuration item: SQL_CLASSIFICATION_ENABLED
* is true
* @param simpleName: the analyzied_logical_plan's getSimpleName
- * @return: This sql's classification
+ * @return This sql's classification
*/
def getSqlClassification(simpleName: String): String = {
jsonNode.map { json =>
diff --git a/extensions/spark/kyuubi-extension-spark-3-1/src/test/scala/org/apache/spark/sql/ZorderSuite.scala b/extensions/spark/kyuubi-extension-spark-3-1/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
index fd04e27dbb5..29a166abf3f 100644
--- a/extensions/spark/kyuubi-extension-spark-3-1/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-1/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
@@ -17,6 +17,20 @@
package org.apache.spark.sql
-class ZorderWithCodegenEnabledSuite extends ZorderWithCodegenEnabledSuiteBase {}
+import org.apache.spark.sql.catalyst.parser.ParserInterface
-class ZorderWithCodegenDisabledSuite extends ZorderWithCodegenDisabledSuiteBase {}
+import org.apache.kyuubi.sql.SparkKyuubiSparkSQLParser
+
+trait ParserSuite { self: ZorderSuiteBase =>
+ override def createParser: ParserInterface = {
+ new SparkKyuubiSparkSQLParser(spark.sessionState.sqlParser)
+ }
+}
+
+class ZorderWithCodegenEnabledSuite
+ extends ZorderWithCodegenEnabledSuiteBase
+ with ParserSuite {}
+
+class ZorderWithCodegenDisabledSuite
+ extends ZorderWithCodegenDisabledSuiteBase
+ with ParserSuite {}
diff --git a/extensions/spark/kyuubi-extension-spark-3-2/pom.xml b/extensions/spark/kyuubi-extension-spark-3-2/pom.xml
index a80040aca65..b1ddcecf84e 100644
--- a/extensions/spark/kyuubi-extension-spark-3-2/pom.xml
+++ b/extensions/spark/kyuubi-extension-spark-3-2/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-extension-spark-3-2_2.12
+ kyuubi-extension-spark-3-2_${scala.binary.version}jarKyuubi Dev Spark Extensions (for Spark 3.2)https://kyuubi.apache.org/
@@ -125,10 +125,21 @@
jakarta.xml.bind-apitest
+
+
+ org.apache.logging.log4j
+ log4j-1.2-api
+ test
+
+
+
+ org.apache.logging.log4j
+ log4j-slf4j-impl
+ test
+
-
org.apache.maven.plugins
@@ -137,7 +148,7 @@
false
- org.apache.kyuubi:kyuubi-extension-spark-common_${scala.binary.version}
+ org.apache.kyuubi:*
diff --git a/extensions/spark/kyuubi-extension-spark-3-2/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala b/extensions/spark/kyuubi-extension-spark-3-2/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
index ef9da41be13..97e77704293 100644
--- a/extensions/spark/kyuubi-extension-spark-3-2/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-2/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
@@ -19,7 +19,7 @@ package org.apache.kyuubi.sql
import org.apache.spark.sql.SparkSessionExtensions
-import org.apache.kyuubi.sql.watchdog.{ForcedMaxOutputRowsRule, MaxPartitionStrategy}
+import org.apache.kyuubi.sql.watchdog.{ForcedMaxOutputRowsRule, MaxScanStrategy}
// scalastyle:off line.size.limit
/**
@@ -38,6 +38,6 @@ class KyuubiSparkSQLExtension extends (SparkSessionExtensions => Unit) {
// watchdog extension
extensions.injectOptimizerRule(ForcedMaxOutputRowsRule)
- extensions.injectPlannerStrategy(MaxPartitionStrategy)
+ extensions.injectPlannerStrategy(MaxScanStrategy)
}
}
diff --git a/extensions/spark/kyuubi-extension-spark-3-2/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala b/extensions/spark/kyuubi-extension-spark-3-2/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
index 2f12a82e23e..87c10bc3467 100644
--- a/extensions/spark/kyuubi-extension-spark-3-2/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-2/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
@@ -21,19 +21,21 @@ import org.antlr.v4.runtime._
import org.antlr.v4.runtime.atn.PredictionMode
import org.antlr.v4.runtime.misc.{Interval, ParseCancellationException}
import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.{FunctionIdentifier, TableIdentifier}
+import org.apache.spark.sql.catalyst.{FunctionIdentifier, SQLConfHelper, TableIdentifier}
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.parser.{ParseErrorListener, ParseException, ParserInterface, PostProcessor}
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.trees.Origin
import org.apache.spark.sql.types.{DataType, StructType}
-abstract class KyuubiSparkSQLParserBase extends ParserInterface {
+abstract class KyuubiSparkSQLParserBase extends ParserInterface with SQLConfHelper {
def delegate: ParserInterface
- def astBuilder: KyuubiSparkSQLAstBuilderBase
+ def astBuilder: KyuubiSparkSQLAstBuilder
override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>
astBuilder.visit(parser.singleStatement()) match {
+ case optimize: UnparsedPredicateOptimize =>
+ astBuilder.buildOptimizeStatement(optimize, delegate.parseExpression)
case plan: LogicalPlan => plan
case _ => delegate.parsePlan(sqlText)
}
@@ -105,7 +107,7 @@ abstract class KyuubiSparkSQLParserBase extends ParserInterface {
class SparkKyuubiSparkSQLParser(
override val delegate: ParserInterface)
extends KyuubiSparkSQLParserBase {
- def astBuilder: KyuubiSparkSQLAstBuilderBase = new KyuubiSparkSQLAstBuilder
+ def astBuilder: KyuubiSparkSQLAstBuilder = new KyuubiSparkSQLAstBuilder
}
/* Copied from Apache Spark's to avoid dependency on Spark Internals */
diff --git a/extensions/spark/kyuubi-extension-spark-3-2/src/test/scala/org/apache/spark/sql/ZorderSuite.scala b/extensions/spark/kyuubi-extension-spark-3-2/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
index fd04e27dbb5..29a166abf3f 100644
--- a/extensions/spark/kyuubi-extension-spark-3-2/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-2/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
@@ -17,6 +17,20 @@
package org.apache.spark.sql
-class ZorderWithCodegenEnabledSuite extends ZorderWithCodegenEnabledSuiteBase {}
+import org.apache.spark.sql.catalyst.parser.ParserInterface
-class ZorderWithCodegenDisabledSuite extends ZorderWithCodegenDisabledSuiteBase {}
+import org.apache.kyuubi.sql.SparkKyuubiSparkSQLParser
+
+trait ParserSuite { self: ZorderSuiteBase =>
+ override def createParser: ParserInterface = {
+ new SparkKyuubiSparkSQLParser(spark.sessionState.sqlParser)
+ }
+}
+
+class ZorderWithCodegenEnabledSuite
+ extends ZorderWithCodegenEnabledSuiteBase
+ with ParserSuite {}
+
+class ZorderWithCodegenDisabledSuite
+ extends ZorderWithCodegenDisabledSuiteBase
+ with ParserSuite {}
diff --git a/extensions/spark/kyuubi-extension-spark-3-3/pom.xml b/extensions/spark/kyuubi-extension-spark-3-3/pom.xml
index ca729a7819b..9b1a30af060 100644
--- a/extensions/spark/kyuubi-extension-spark-3-3/pom.xml
+++ b/extensions/spark/kyuubi-extension-spark-3-3/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-extension-spark-3-3_2.12
+ kyuubi-extension-spark-3-3_${scala.binary.version}jarKyuubi Dev Spark Extensions (for Spark 3.3)https://kyuubi.apache.org/
@@ -37,6 +37,14 @@
${project.version}
+
+ org.apache.kyuubi
+ kyuubi-download
+ ${project.version}
+ pom
+ test
+
+
org.apache.kyuubikyuubi-extension-spark-common_${scala.binary.version}
@@ -45,6 +53,14 @@
test
+
+ org.apache.kyuubi
+ kyuubi-util-scala_${scala.binary.version}
+ ${project.version}
+ test-jar
+ test
+
+
org.scala-langscala-library
@@ -130,6 +146,38 @@
+
+ org.codehaus.mojo
+ build-helper-maven-plugin
+
+
+ regex-property
+
+ regex-property
+
+
+ spark.home
+ ${project.basedir}/../../../externals/kyuubi-download/target/${spark.archive.name}
+ (.+)\.tgz
+ $1
+
+
+
+
+
+ org.scalatest
+ scalatest-maven-plugin
+
+
+
+ ${spark.home}
+ ${scala.binary.version}
+
+
+ org.apache.maven.pluginsmaven-shade-plugin
@@ -137,7 +185,7 @@
false
- org.apache.kyuubi:kyuubi-extension-spark-common_${scala.binary.version}
+ org.apache.kyuubi:*
diff --git a/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala b/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
index 0db9b3ab88a..792315d897a 100644
--- a/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
@@ -19,7 +19,7 @@ package org.apache.kyuubi.sql
import org.apache.spark.sql.{FinalStageResourceManager, InjectCustomResourceProfile, SparkSessionExtensions}
-import org.apache.kyuubi.sql.watchdog.{ForcedMaxOutputRowsRule, MaxPartitionStrategy}
+import org.apache.kyuubi.sql.watchdog.{ForcedMaxOutputRowsRule, MaxScanStrategy}
// scalastyle:off line.size.limit
/**
@@ -38,9 +38,9 @@ class KyuubiSparkSQLExtension extends (SparkSessionExtensions => Unit) {
// watchdog extension
extensions.injectOptimizerRule(ForcedMaxOutputRowsRule)
- extensions.injectPlannerStrategy(MaxPartitionStrategy)
+ extensions.injectPlannerStrategy(MaxScanStrategy)
- extensions.injectQueryStagePrepRule(FinalStageResourceManager)
+ extensions.injectQueryStagePrepRule(FinalStageResourceManager(_))
extensions.injectQueryStagePrepRule(InjectCustomResourceProfile)
}
}
diff --git a/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala b/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
index af1711ebbe7..c4418c33c44 100644
--- a/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
@@ -21,19 +21,21 @@ import org.antlr.v4.runtime._
import org.antlr.v4.runtime.atn.PredictionMode
import org.antlr.v4.runtime.misc.{Interval, ParseCancellationException}
import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.{FunctionIdentifier, TableIdentifier}
+import org.apache.spark.sql.catalyst.{FunctionIdentifier, SQLConfHelper, TableIdentifier}
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.parser.{ParseErrorListener, ParseException, ParserInterface, PostProcessor}
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.trees.Origin
import org.apache.spark.sql.types.{DataType, StructType}
-abstract class KyuubiSparkSQLParserBase extends ParserInterface {
+abstract class KyuubiSparkSQLParserBase extends ParserInterface with SQLConfHelper {
def delegate: ParserInterface
- def astBuilder: KyuubiSparkSQLAstBuilderBase
+ def astBuilder: KyuubiSparkSQLAstBuilder
override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>
astBuilder.visit(parser.singleStatement()) match {
+ case optimize: UnparsedPredicateOptimize =>
+ astBuilder.buildOptimizeStatement(optimize, delegate.parseExpression)
case plan: LogicalPlan => plan
case _ => delegate.parsePlan(sqlText)
}
@@ -113,7 +115,7 @@ abstract class KyuubiSparkSQLParserBase extends ParserInterface {
class SparkKyuubiSparkSQLParser(
override val delegate: ParserInterface)
extends KyuubiSparkSQLParserBase {
- def astBuilder: KyuubiSparkSQLAstBuilderBase = new KyuubiSparkSQLAstBuilder
+ def astBuilder: KyuubiSparkSQLAstBuilder = new KyuubiSparkSQLAstBuilder
}
/* Copied from Apache Spark's to avoid dependency on Spark Internals */
diff --git a/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/spark/sql/FinalStageResourceManager.scala b/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/spark/sql/FinalStageResourceManager.scala
index 2bf7ae6b75e..32fb9f5ce84 100644
--- a/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/spark/sql/FinalStageResourceManager.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-3/src/main/scala/org/apache/spark/sql/FinalStageResourceManager.scala
@@ -22,10 +22,13 @@ import scala.collection.mutable
import scala.collection.mutable.ArrayBuffer
import org.apache.spark.{ExecutorAllocationClient, MapOutputTrackerMaster, SparkContext, SparkEnv}
+import org.apache.spark.internal.Logging
+import org.apache.spark.resource.ResourceProfile
import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SortExec, SparkPlan}
import org.apache.spark.sql.execution.adaptive._
+import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
import org.apache.spark.sql.execution.exchange.{ENSURE_REQUIREMENTS, ShuffleExchangeExec}
import org.apache.kyuubi.sql.{KyuubiSQLConf, MarkNumOutputColumnsRule}
@@ -69,6 +72,14 @@ case class FinalStageResourceManager(session: SparkSession)
return plan
}
+ // It's not safe to kill executors if this plan contains table cache.
+ // If the executor loses then the rdd would re-compute those partition.
+ if (hasTableCache(plan) &&
+ conf.getConf(KyuubiSQLConf.FINAL_WRITE_STAGE_SKIP_KILLING_EXECUTORS_FOR_TABLE_CACHE)) {
+ return plan
+ }
+
+ // TODO: move this to query stage optimizer when updating Spark to 3.5.x
// Since we are in `prepareQueryStage`, the AQE shuffle read has not been applied.
// So we need to apply it by self.
val shuffleRead = queryStageOptimizerRules.foldLeft(stageOpt.get.asInstanceOf[SparkPlan]) {
@@ -119,7 +130,11 @@ case class FinalStageResourceManager(session: SparkSession)
shuffleId: Int,
numReduce: Int): Seq[String] = {
val tracker = SparkEnv.get.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster]
- val shuffleStatus = tracker.shuffleStatuses(shuffleId)
+ val shuffleStatusOpt = tracker.shuffleStatuses.get(shuffleId)
+ if (shuffleStatusOpt.isEmpty) {
+ return Seq.empty
+ }
+ val shuffleStatus = shuffleStatusOpt.get
val executorToBlockSize = new mutable.HashMap[String, Long]
shuffleStatus.withMapStatuses { mapStatus =>
mapStatus.foreach { status =>
@@ -157,7 +172,7 @@ case class FinalStageResourceManager(session: SparkSession)
// Evict the rest executors according to the shuffle block size
executorToBlockSize.toSeq.sortBy(_._2).foreach { case (id, _) =>
- if (executorIdsToKill.length < expectedNumExecutorToKill) {
+ if (executorIdsToKill.length < expectedNumExecutorToKill && existedExecutors.contains(id)) {
executorIdsToKill.append(id)
}
}
@@ -172,19 +187,44 @@ case class FinalStageResourceManager(session: SparkSession)
numReduce: Int): Unit = {
val executorAllocationClient = sc.schedulerBackend.asInstanceOf[ExecutorAllocationClient]
- val executorsToKill = findExecutorToKill(sc, targetExecutors, shuffleId, numReduce)
+ val executorsToKill =
+ if (conf.getConf(KyuubiSQLConf.FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_KILL_ALL)) {
+ executorAllocationClient.getExecutorIds()
+ } else {
+ findExecutorToKill(sc, targetExecutors, shuffleId, numReduce)
+ }
logInfo(s"Request to kill executors, total count ${executorsToKill.size}, " +
s"[${executorsToKill.mkString(", ")}].")
+ if (executorsToKill.isEmpty) {
+ return
+ }
// Note, `SparkContext#killExecutors` does not allow with DRA enabled,
// see `https://github.com/apache/spark/pull/20604`.
// It may cause the status in `ExecutorAllocationManager` inconsistent with
// `CoarseGrainedSchedulerBackend` for a while. But it should be synchronous finally.
+ //
+ // We should adjust target num executors, otherwise `YarnAllocator` might re-request original
+ // target executors if DRA has not updated target executors yet.
+ // Note, DRA would re-adjust executors if there are more tasks to be executed, so we are safe.
+ //
+ // * We kill executor
+ // * YarnAllocator re-request target executors
+ // * DRA can not release executors since they are new added
+ // ----------------------------------------------------------------> timeline
executorAllocationClient.killExecutors(
executorIds = executorsToKill,
- adjustTargetNumExecutors = false,
+ adjustTargetNumExecutors = true,
countFailures = false,
force = false)
+
+ FinalStageResourceManager.getAdjustedTargetExecutors(sc)
+ .filter(_ < targetExecutors).foreach { adjustedExecutors =>
+ val delta = targetExecutors - adjustedExecutors
+ logInfo(s"Target executors after kill ($adjustedExecutors) is lower than required " +
+ s"($targetExecutors). Requesting $delta additional executor(s).")
+ executorAllocationClient.requestExecutors(delta)
+ }
}
@transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
@@ -193,7 +233,32 @@ case class FinalStageResourceManager(session: SparkSession)
OptimizeShuffleWithLocalRead)
}
-trait FinalRebalanceStageHelper {
+object FinalStageResourceManager extends Logging {
+
+ private[sql] def getAdjustedTargetExecutors(sc: SparkContext): Option[Int] = {
+ sc.schedulerBackend match {
+ case schedulerBackend: CoarseGrainedSchedulerBackend =>
+ try {
+ val field = classOf[CoarseGrainedSchedulerBackend]
+ .getDeclaredField("requestedTotalExecutorsPerResourceProfile")
+ field.setAccessible(true)
+ schedulerBackend.synchronized {
+ val requestedTotalExecutorsPerResourceProfile =
+ field.get(schedulerBackend).asInstanceOf[mutable.HashMap[ResourceProfile, Int]]
+ val defaultRp = sc.resourceProfileManager.defaultResourceProfile
+ requestedTotalExecutorsPerResourceProfile.get(defaultRp)
+ }
+ } catch {
+ case e: Exception =>
+ logWarning("Failed to get requestedTotalExecutors of Default ResourceProfile", e)
+ None
+ }
+ case _ => None
+ }
+ }
+}
+
+trait FinalRebalanceStageHelper extends AdaptiveSparkPlanHelper {
@tailrec
final protected def findFinalRebalanceStage(plan: SparkPlan): Option[ShuffleQueryStageExec] = {
plan match {
@@ -201,11 +266,18 @@ trait FinalRebalanceStageHelper {
case f: FilterExec => findFinalRebalanceStage(f.child)
case s: SortExec if !s.global => findFinalRebalanceStage(s.child)
case stage: ShuffleQueryStageExec
- if stage.isMaterialized &&
+ if stage.isMaterialized && stage.mapStats.isDefined &&
stage.plan.isInstanceOf[ShuffleExchangeExec] &&
stage.plan.asInstanceOf[ShuffleExchangeExec].shuffleOrigin != ENSURE_REQUIREMENTS =>
Some(stage)
case _ => None
}
}
+
+ final protected def hasTableCache(plan: SparkPlan): Boolean = {
+ find(plan) {
+ case _: InMemoryTableScanExec => true
+ case _ => false
+ }.isDefined
+ }
}
diff --git a/extensions/spark/kyuubi-extension-spark-3-3/src/test/scala/org/apache/spark/sql/FinalStageResourceManagerSuite.scala b/extensions/spark/kyuubi-extension-spark-3-3/src/test/scala/org/apache/spark/sql/FinalStageResourceManagerSuite.scala
new file mode 100644
index 00000000000..4b9991ef6f2
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-3/src/test/scala/org/apache/spark/sql/FinalStageResourceManagerSuite.scala
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkConf
+import org.scalatest.time.{Minutes, Span}
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+import org.apache.kyuubi.tags.SparkLocalClusterTest
+
+@SparkLocalClusterTest
+class FinalStageResourceManagerSuite extends KyuubiSparkSQLExtensionTest {
+
+ override def sparkConf(): SparkConf = {
+ // It is difficult to run spark in local-cluster mode when spark.testing is set.
+ sys.props.remove("spark.testing")
+
+ super.sparkConf().set("spark.master", "local-cluster[3, 1, 1024]")
+ .set("spark.dynamicAllocation.enabled", "true")
+ .set("spark.dynamicAllocation.initialExecutors", "3")
+ .set("spark.dynamicAllocation.minExecutors", "1")
+ .set("spark.dynamicAllocation.shuffleTracking.enabled", "true")
+ .set(KyuubiSQLConf.FINAL_STAGE_CONFIG_ISOLATION.key, "true")
+ .set(KyuubiSQLConf.FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_ENABLED.key, "true")
+ }
+
+ test("[KYUUBI #5136][Bug] Final Stage hangs forever") {
+ // Prerequisite to reproduce the bug:
+ // 1. Dynamic allocation is enabled.
+ // 2. Dynamic allocation min executors is 1.
+ // 3. target executors < active executors.
+ // 4. No active executor is left after FinalStageResourceManager killed executors.
+ // This is possible because FinalStageResourceManager retained executors may already be
+ // requested to be killed but not died yet.
+ // 5. Final Stage required executors is 1.
+ withSQLConf(
+ (KyuubiSQLConf.FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_KILL_ALL.key, "true")) {
+ withTable("final_stage") {
+ eventually(timeout(Span(10, Minutes))) {
+ sql(
+ "CREATE TABLE final_stage AS SELECT id, count(*) as num FROM (SELECT 0 id) GROUP BY id")
+ }
+ assert(FinalStageResourceManager.getAdjustedTargetExecutors(spark.sparkContext).get == 1)
+ }
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-3/src/test/scala/org/apache/spark/sql/ZorderSuite.scala b/extensions/spark/kyuubi-extension-spark-3-3/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
index 90fc17e2430..a08366f1d4a 100644
--- a/extensions/spark/kyuubi-extension-spark-3-3/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-3/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
@@ -17,13 +17,14 @@
package org.apache.spark.sql
+import org.apache.spark.sql.catalyst.parser.ParserInterface
import org.apache.spark.sql.catalyst.plans.logical.{RebalancePartitions, Sort}
import org.apache.spark.sql.internal.SQLConf
-import org.apache.kyuubi.sql.KyuubiSQLConf
+import org.apache.kyuubi.sql.{KyuubiSQLConf, SparkKyuubiSparkSQLParser}
import org.apache.kyuubi.sql.zorder.Zorder
-trait ZorderWithCodegenEnabledSuiteBase33 extends ZorderWithCodegenEnabledSuiteBase {
+trait ZorderSuiteSpark33 extends ZorderSuiteBase {
test("Add rebalance before zorder") {
Seq("true" -> false, "false" -> true).foreach { case (useOriginalOrdering, zorder) =>
@@ -106,6 +107,18 @@ trait ZorderWithCodegenEnabledSuiteBase33 extends ZorderWithCodegenEnabledSuiteB
}
}
-class ZorderWithCodegenEnabledSuite extends ZorderWithCodegenEnabledSuiteBase33 {}
+trait ParserSuite { self: ZorderSuiteBase =>
+ override def createParser: ParserInterface = {
+ new SparkKyuubiSparkSQLParser(spark.sessionState.sqlParser)
+ }
+}
+
+class ZorderWithCodegenEnabledSuite
+ extends ZorderWithCodegenEnabledSuiteBase
+ with ZorderSuiteSpark33
+ with ParserSuite {}
-class ZorderWithCodegenDisabledSuite extends ZorderWithCodegenEnabledSuiteBase33 {}
+class ZorderWithCodegenDisabledSuite
+ extends ZorderWithCodegenDisabledSuiteBase
+ with ZorderSuiteSpark33
+ with ParserSuite {}
diff --git a/extensions/spark/kyuubi-spark-connector-kudu/pom.xml b/extensions/spark/kyuubi-extension-spark-3-4/pom.xml
similarity index 64%
rename from extensions/spark/kyuubi-spark-connector-kudu/pom.xml
rename to extensions/spark/kyuubi-extension-spark-3-4/pom.xml
index 97356cd9332..ee5b5f1558a 100644
--- a/extensions/spark/kyuubi-spark-connector-kudu/pom.xml
+++ b/extensions/spark/kyuubi-extension-spark-3-4/pom.xml
@@ -21,13 +21,13 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-spark-connector-kudu_2.12
+ kyuubi-extension-spark-3-4_${scala.binary.version}jar
- Kyuubi Spark Kudu Connector
+ Kyuubi Dev Spark Extensions (for Spark 3.4)https://kyuubi.apache.org/
@@ -38,20 +38,14 @@
- org.apache.logging.log4j
- log4j-api
- provided
-
-
-
- org.apache.logging.log4j
- log4j-core
+ org.apache.spark
+ spark-sql_${scala.binary.version}providedorg.apache.spark
- spark-sql_${scala.binary.version}
+ spark-hive_${scala.binary.version}provided
@@ -62,48 +56,45 @@
- org.apache.kudu
- kudu-client
-
-
-
- org.apache.spark
- spark-catalyst_${scala.binary.version}
- test-jar
+ org.apache.kyuubi
+ kyuubi-download
+ ${project.version}
+ pomtest
- org.scalatestplus
- scalacheck-1-17_${scala.binary.version}
+ org.apache.kyuubi
+ kyuubi-util-scala_${scala.binary.version}
+ ${project.version}
+ test-jartest
- com.dimafeng
- testcontainers-scala-scalatest_${scala.binary.version}
+ org.apache.spark
+ spark-core_${scala.binary.version}
+ test-jartestorg.apache.spark
- spark-sql_${scala.binary.version}
- ${spark.version}
+ spark-catalyst_${scala.binary.version}test-jartest
- org.apache.kyuubi
- kyuubi-common_${scala.binary.version}
- ${project.version}
+ org.scalatestplus
+ scalacheck-1-17_${scala.binary.version}test
- org.apache.kyuubi
- kyuubi-common_${scala.binary.version}
- ${project.version}
+ org.apache.spark
+ spark-sql_${scala.binary.version}
+ ${spark.version}test-jartest
@@ -136,16 +127,55 @@
jakarta.xml.bind-apitest
+
+
+ org.apache.logging.log4j
+ log4j-slf4j-impl
+ test
+
- org.apache.maven.plugins
- maven-dependency-plugin
+ org.codehaus.mojo
+ build-helper-maven-plugin
+
+
+ regex-property
+
+ regex-property
+
+
+ spark.home
+ ${project.basedir}/../../../externals/kyuubi-download/target/${spark.archive.name}
+ (.+)\.tgz
+ $1
+
+
+
+
+
+ org.scalatest
+ scalatest-maven-plugin
+
+
+
+ ${spark.home}
+ ${scala.binary.version}
+
+
+
+
+ org.antlr
+ antlr4-maven-plugin
- true
+ true
+ ${project.basedir}/src/main/antlr4
@@ -156,43 +186,9 @@
false
- org.apache.kudu:kudu-client
- com.stumbleupon:async
+ org.apache.kyuubi:*
-
-
- org.apache.kudu:kudu-client
-
- META-INF/maven/**
- META-INF/native/**
- META-INF/native-image/**
- MANIFEST.MF
- LICENSE
- LICENSE.txt
- NOTICE
- NOTICE.txt
- *.properties
- **/*.proto
-
-
-
-
-
- org.apache.kudu
- ${kyuubi.shade.packageName}.org.apache.kudu
-
- org.apache.kudu.**
-
-
-
- com.stumbleupon:async
- ${kyuubi.shade.packageName}.com.stumbleupon.async
-
- com.stumbleupon.async.**
-
-
-
@@ -203,20 +199,6 @@
-
-
- org.apache.maven.plugins
- maven-jar-plugin
-
-
- prepare-test-jar
-
- test-jar
-
- test-compile
-
-
- target/scala-${scala.binary.version}/classestarget/scala-${scala.binary.version}/test-classes
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/antlr4/org/apache/kyuubi/sql/KyuubiSparkSQL.g4 b/extensions/spark/kyuubi-extension-spark-3-4/src/main/antlr4/org/apache/kyuubi/sql/KyuubiSparkSQL.g4
new file mode 100644
index 00000000000..e52b7f5cfeb
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/antlr4/org/apache/kyuubi/sql/KyuubiSparkSQL.g4
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+grammar KyuubiSparkSQL;
+
+@members {
+ /**
+ * Verify whether current token is a valid decimal token (which contains dot).
+ * Returns true if the character that follows the token is not a digit or letter or underscore.
+ *
+ * For example:
+ * For char stream "2.3", "2." is not a valid decimal token, because it is followed by digit '3'.
+ * For char stream "2.3_", "2.3" is not a valid decimal token, because it is followed by '_'.
+ * For char stream "2.3W", "2.3" is not a valid decimal token, because it is followed by 'W'.
+ * For char stream "12.0D 34.E2+0.12 " 12.0D is a valid decimal token because it is followed
+ * by a space. 34.E2 is a valid decimal token because it is followed by symbol '+'
+ * which is not a digit or letter or underscore.
+ */
+ public boolean isValidDecimal() {
+ int nextChar = _input.LA(1);
+ if (nextChar >= 'A' && nextChar <= 'Z' || nextChar >= '0' && nextChar <= '9' ||
+ nextChar == '_') {
+ return false;
+ } else {
+ return true;
+ }
+ }
+ }
+
+tokens {
+ DELIMITER
+}
+
+singleStatement
+ : statement EOF
+ ;
+
+statement
+ : OPTIMIZE multipartIdentifier whereClause? zorderClause #optimizeZorder
+ | .*? #passThrough
+ ;
+
+whereClause
+ : WHERE partitionPredicate = predicateToken
+ ;
+
+zorderClause
+ : ZORDER BY order+=multipartIdentifier (',' order+=multipartIdentifier)*
+ ;
+
+// We don't have an expression rule in our grammar here, so we just grab the tokens and defer
+// parsing them to later.
+predicateToken
+ : .+?
+ ;
+
+multipartIdentifier
+ : parts+=identifier ('.' parts+=identifier)*
+ ;
+
+identifier
+ : strictIdentifier
+ ;
+
+strictIdentifier
+ : IDENTIFIER #unquotedIdentifier
+ | quotedIdentifier #quotedIdentifierAlternative
+ | nonReserved #unquotedIdentifier
+ ;
+
+quotedIdentifier
+ : BACKQUOTED_IDENTIFIER
+ ;
+
+nonReserved
+ : AND
+ | BY
+ | FALSE
+ | DATE
+ | INTERVAL
+ | OPTIMIZE
+ | OR
+ | TABLE
+ | TIMESTAMP
+ | TRUE
+ | WHERE
+ | ZORDER
+ ;
+
+AND: 'AND';
+BY: 'BY';
+FALSE: 'FALSE';
+DATE: 'DATE';
+INTERVAL: 'INTERVAL';
+OPTIMIZE: 'OPTIMIZE';
+OR: 'OR';
+TABLE: 'TABLE';
+TIMESTAMP: 'TIMESTAMP';
+TRUE: 'TRUE';
+WHERE: 'WHERE';
+ZORDER: 'ZORDER';
+
+MINUS: '-';
+
+BIGINT_LITERAL
+ : DIGIT+ 'L'
+ ;
+
+SMALLINT_LITERAL
+ : DIGIT+ 'S'
+ ;
+
+TINYINT_LITERAL
+ : DIGIT+ 'Y'
+ ;
+
+INTEGER_VALUE
+ : DIGIT+
+ ;
+
+DECIMAL_VALUE
+ : DIGIT+ EXPONENT
+ | DECIMAL_DIGITS EXPONENT? {isValidDecimal()}?
+ ;
+
+DOUBLE_LITERAL
+ : DIGIT+ EXPONENT? 'D'
+ | DECIMAL_DIGITS EXPONENT? 'D' {isValidDecimal()}?
+ ;
+
+BIGDECIMAL_LITERAL
+ : DIGIT+ EXPONENT? 'BD'
+ | DECIMAL_DIGITS EXPONENT? 'BD' {isValidDecimal()}?
+ ;
+
+BACKQUOTED_IDENTIFIER
+ : '`' ( ~'`' | '``' )* '`'
+ ;
+
+IDENTIFIER
+ : (LETTER | DIGIT | '_')+
+ ;
+
+fragment DECIMAL_DIGITS
+ : DIGIT+ '.' DIGIT*
+ | '.' DIGIT+
+ ;
+
+fragment EXPONENT
+ : 'E' [+-]? DIGIT+
+ ;
+
+fragment DIGIT
+ : [0-9]
+ ;
+
+fragment LETTER
+ : [A-Z]
+ ;
+
+SIMPLE_COMMENT
+ : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
+ ;
+
+BRACKETED_COMMENT
+ : '/*' .*? '*/' -> channel(HIDDEN)
+ ;
+
+WS : [ \r\n\t]+ -> channel(HIDDEN)
+ ;
+
+// Catch-all for anything we can't recognize.
+// We use this to be able to ignore and recover all the text
+// when splitting statements with DelimiterLexer
+UNRECOGNIZED
+ : .
+ ;
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/DropIgnoreNonexistent.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/DropIgnoreNonexistent.scala
new file mode 100644
index 00000000000..e33632b8b30
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/DropIgnoreNonexistent.scala
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kyuubi.sql
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedFunctionName, UnresolvedRelation}
+import org.apache.spark.sql.catalyst.plans.logical.{DropFunction, DropNamespace, LogicalPlan, NoopCommand, UncacheTable}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.command.{AlterTableDropPartitionCommand, DropTableCommand}
+
+import org.apache.kyuubi.sql.KyuubiSQLConf._
+
+case class DropIgnoreNonexistent(session: SparkSession) extends Rule[LogicalPlan] {
+
+ override def apply(plan: LogicalPlan): LogicalPlan = {
+ if (conf.getConf(DROP_IGNORE_NONEXISTENT)) {
+ plan match {
+ case i @ AlterTableDropPartitionCommand(_, _, false, _, _) =>
+ i.copy(ifExists = true)
+ case i @ DropTableCommand(_, false, _, _) =>
+ i.copy(ifExists = true)
+ case i @ DropNamespace(_, false, _) =>
+ i.copy(ifExists = true)
+ case UncacheTable(u: UnresolvedRelation, false, _) =>
+ NoopCommand("UNCACHE TABLE", u.multipartIdentifier)
+ case DropFunction(u: UnresolvedFunctionName, false) =>
+ NoopCommand("DROP FUNCTION", u.multipartIdentifier)
+ case _ => plan
+ }
+ } else {
+ plan
+ }
+ }
+
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/InferRebalanceAndSortOrders.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/InferRebalanceAndSortOrders.scala
new file mode 100644
index 00000000000..fcbf5c0a122
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/InferRebalanceAndSortOrders.scala
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import scala.annotation.tailrec
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeSet, Expression, NamedExpression, UnaryExpression}
+import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys
+import org.apache.spark.sql.catalyst.plans.{FullOuter, Inner, LeftAnti, LeftOuter, LeftSemi, RightOuter}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, LogicalPlan, Project, Sort, SubqueryAlias, View}
+
+/**
+ * Infer the columns for Rebalance and Sort to improve the compression ratio.
+ *
+ * For example
+ * {{{
+ * INSERT INTO TABLE t PARTITION(p='a')
+ * SELECT * FROM t1 JOIN t2 on t1.c1 = t2.c1
+ * }}}
+ * the inferred columns are: t1.c1
+ */
+object InferRebalanceAndSortOrders {
+
+ type PartitioningAndOrdering = (Seq[Expression], Seq[Expression])
+
+ private def getAliasMap(named: Seq[NamedExpression]): Map[Expression, Attribute] = {
+ @tailrec
+ def throughUnary(e: Expression): Expression = e match {
+ case u: UnaryExpression if u.deterministic =>
+ throughUnary(u.child)
+ case _ => e
+ }
+
+ named.flatMap {
+ case a @ Alias(child, _) =>
+ Some((throughUnary(child).canonicalized, a.toAttribute))
+ case _ => None
+ }.toMap
+ }
+
+ def infer(plan: LogicalPlan): Option[PartitioningAndOrdering] = {
+ def candidateKeys(
+ input: LogicalPlan,
+ output: AttributeSet = AttributeSet.empty): Option[PartitioningAndOrdering] = {
+ input match {
+ case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, _, _, _, _, _) =>
+ joinType match {
+ case LeftSemi | LeftAnti | LeftOuter => Some((leftKeys, leftKeys))
+ case RightOuter => Some((rightKeys, rightKeys))
+ case Inner | FullOuter =>
+ if (output.isEmpty) {
+ Some((leftKeys ++ rightKeys, leftKeys ++ rightKeys))
+ } else {
+ assert(leftKeys.length == rightKeys.length)
+ val keys = leftKeys.zip(rightKeys).flatMap { case (left, right) =>
+ if (left.references.subsetOf(output)) {
+ Some(left)
+ } else if (right.references.subsetOf(output)) {
+ Some(right)
+ } else {
+ None
+ }
+ }
+ Some((keys, keys))
+ }
+ case _ => None
+ }
+ case agg: Aggregate =>
+ val aliasMap = getAliasMap(agg.aggregateExpressions)
+ Some((
+ agg.groupingExpressions.map(p => aliasMap.getOrElse(p.canonicalized, p)),
+ agg.groupingExpressions.map(o => aliasMap.getOrElse(o.canonicalized, o))))
+ case s: Sort => Some((s.order.map(_.child), s.order.map(_.child)))
+ case p: Project =>
+ val aliasMap = getAliasMap(p.projectList)
+ candidateKeys(p.child, p.references).map { case (partitioning, ordering) =>
+ (
+ partitioning.map(p => aliasMap.getOrElse(p.canonicalized, p)),
+ ordering.map(o => aliasMap.getOrElse(o.canonicalized, o)))
+ }
+ case f: Filter => candidateKeys(f.child, output)
+ case s: SubqueryAlias => candidateKeys(s.child, output)
+ case v: View => candidateKeys(v.child, output)
+
+ case _ => None
+ }
+ }
+
+ candidateKeys(plan).map { case (partitioning, ordering) =>
+ (
+ partitioning.filter(_.references.subsetOf(plan.outputSet)),
+ ordering.filter(_.references.subsetOf(plan.outputSet)))
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/InsertShuffleNodeBeforeJoin.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/InsertShuffleNodeBeforeJoin.scala
new file mode 100644
index 00000000000..1a02e8c1e67
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/InsertShuffleNodeBeforeJoin.scala
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.apache.spark.sql.catalyst.plans.physical.Distribution
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.{SortExec, SparkPlan}
+import org.apache.spark.sql.execution.adaptive.QueryStageExec
+import org.apache.spark.sql.execution.aggregate.BaseAggregateExec
+import org.apache.spark.sql.execution.exchange.{Exchange, ShuffleExchangeExec}
+import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec, SortMergeJoinExec}
+import org.apache.spark.sql.internal.SQLConf
+
+import org.apache.kyuubi.sql.KyuubiSQLConf._
+
+/**
+ * Insert shuffle node before join if it doesn't exist to make `OptimizeSkewedJoin` works.
+ */
+object InsertShuffleNodeBeforeJoin extends Rule[SparkPlan] {
+
+ override def apply(plan: SparkPlan): SparkPlan = {
+ // this rule has no meaning without AQE
+ if (!conf.getConf(FORCE_SHUFFLE_BEFORE_JOIN) ||
+ !conf.getConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED)) {
+ return plan
+ }
+
+ val newPlan = insertShuffleBeforeJoin(plan)
+ if (plan.fastEquals(newPlan)) {
+ plan
+ } else {
+ // make sure the output partitioning and ordering will not be broken.
+ KyuubiEnsureRequirements.apply(newPlan)
+ }
+ }
+
+ // Since spark 3.3, insertShuffleBeforeJoin shouldn't be applied if join is skewed.
+ private def insertShuffleBeforeJoin(plan: SparkPlan): SparkPlan = plan transformUp {
+ case smj @ SortMergeJoinExec(_, _, _, _, l, r, isSkewJoin) if !isSkewJoin =>
+ smj.withNewChildren(checkAndInsertShuffle(smj.requiredChildDistribution.head, l) ::
+ checkAndInsertShuffle(smj.requiredChildDistribution(1), r) :: Nil)
+
+ case shj: ShuffledHashJoinExec if !shj.isSkewJoin =>
+ if (!shj.left.isInstanceOf[Exchange] && !shj.right.isInstanceOf[Exchange]) {
+ shj.withNewChildren(withShuffleExec(shj.requiredChildDistribution.head, shj.left) ::
+ withShuffleExec(shj.requiredChildDistribution(1), shj.right) :: Nil)
+ } else if (!shj.left.isInstanceOf[Exchange]) {
+ shj.withNewChildren(
+ withShuffleExec(shj.requiredChildDistribution.head, shj.left) :: shj.right :: Nil)
+ } else if (!shj.right.isInstanceOf[Exchange]) {
+ shj.withNewChildren(
+ shj.left :: withShuffleExec(shj.requiredChildDistribution(1), shj.right) :: Nil)
+ } else {
+ shj
+ }
+ }
+
+ private def checkAndInsertShuffle(
+ distribution: Distribution,
+ child: SparkPlan): SparkPlan = child match {
+ case SortExec(_, _, _: Exchange, _) =>
+ child
+ case SortExec(_, _, _: QueryStageExec, _) =>
+ child
+ case sort @ SortExec(_, _, agg: BaseAggregateExec, _) =>
+ sort.withNewChildren(withShuffleExec(distribution, agg) :: Nil)
+ case _ =>
+ withShuffleExec(distribution, child)
+ }
+
+ private def withShuffleExec(distribution: Distribution, child: SparkPlan): SparkPlan = {
+ val numPartitions = distribution.requiredNumPartitions
+ .getOrElse(conf.numShufflePartitions)
+ ShuffleExchangeExec(distribution.createPartitioning(numPartitions), child)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiEnsureRequirements.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiEnsureRequirements.scala
new file mode 100644
index 00000000000..a17e0a4652b
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiEnsureRequirements.scala
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.apache.spark.sql.catalyst.expressions.SortOrder
+import org.apache.spark.sql.catalyst.plans.physical.{BroadcastDistribution, Distribution, UnspecifiedDistribution}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.{SortExec, SparkPlan}
+import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, ShuffleExchangeExec}
+
+/**
+ * Copy from Apache Spark `EnsureRequirements`
+ * 1. remove reorder join predicates
+ * 2. remove shuffle pruning
+ */
+object KyuubiEnsureRequirements extends Rule[SparkPlan] {
+ private def ensureDistributionAndOrdering(operator: SparkPlan): SparkPlan = {
+ val requiredChildDistributions: Seq[Distribution] = operator.requiredChildDistribution
+ val requiredChildOrderings: Seq[Seq[SortOrder]] = operator.requiredChildOrdering
+ var children: Seq[SparkPlan] = operator.children
+ assert(requiredChildDistributions.length == children.length)
+ assert(requiredChildOrderings.length == children.length)
+
+ // Ensure that the operator's children satisfy their output distribution requirements.
+ children = children.zip(requiredChildDistributions).map {
+ case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
+ child
+ case (child, BroadcastDistribution(mode)) =>
+ BroadcastExchangeExec(mode, child)
+ case (child, distribution) =>
+ val numPartitions = distribution.requiredNumPartitions
+ .getOrElse(conf.numShufflePartitions)
+ ShuffleExchangeExec(distribution.createPartitioning(numPartitions), child)
+ }
+
+ // Get the indexes of children which have specified distribution requirements and need to have
+ // same number of partitions.
+ val childrenIndexes = requiredChildDistributions.zipWithIndex.filter {
+ case (UnspecifiedDistribution, _) => false
+ case (_: BroadcastDistribution, _) => false
+ case _ => true
+ }.map(_._2)
+
+ val childrenNumPartitions =
+ childrenIndexes.map(children(_).outputPartitioning.numPartitions).toSet
+
+ if (childrenNumPartitions.size > 1) {
+ // Get the number of partitions which is explicitly required by the distributions.
+ val requiredNumPartitions = {
+ val numPartitionsSet = childrenIndexes.flatMap {
+ index => requiredChildDistributions(index).requiredNumPartitions
+ }.toSet
+ assert(
+ numPartitionsSet.size <= 1,
+ s"$operator have incompatible requirements of the number of partitions for its children")
+ numPartitionsSet.headOption
+ }
+
+ // If there are non-shuffle children that satisfy the required distribution, we have
+ // some tradeoffs when picking the expected number of shuffle partitions:
+ // 1. We should avoid shuffling these children.
+ // 2. We should have a reasonable parallelism.
+ val nonShuffleChildrenNumPartitions =
+ childrenIndexes.map(children).filterNot(_.isInstanceOf[ShuffleExchangeExec])
+ .map(_.outputPartitioning.numPartitions)
+ val expectedChildrenNumPartitions =
+ if (nonShuffleChildrenNumPartitions.nonEmpty) {
+ if (nonShuffleChildrenNumPartitions.length == childrenIndexes.length) {
+ // Here we pick the max number of partitions among these non-shuffle children.
+ nonShuffleChildrenNumPartitions.max
+ } else {
+ // Here we pick the max number of partitions among these non-shuffle children as the
+ // expected number of shuffle partitions. However, if it's smaller than
+ // `conf.numShufflePartitions`, we pick `conf.numShufflePartitions` as the
+ // expected number of shuffle partitions.
+ math.max(nonShuffleChildrenNumPartitions.max, conf.defaultNumShufflePartitions)
+ }
+ } else {
+ childrenNumPartitions.max
+ }
+
+ val targetNumPartitions = requiredNumPartitions.getOrElse(expectedChildrenNumPartitions)
+
+ children = children.zip(requiredChildDistributions).zipWithIndex.map {
+ case ((child, distribution), index) if childrenIndexes.contains(index) =>
+ if (child.outputPartitioning.numPartitions == targetNumPartitions) {
+ child
+ } else {
+ val defaultPartitioning = distribution.createPartitioning(targetNumPartitions)
+ child match {
+ // If child is an exchange, we replace it with a new one having defaultPartitioning.
+ case ShuffleExchangeExec(_, c, _) => ShuffleExchangeExec(defaultPartitioning, c)
+ case _ => ShuffleExchangeExec(defaultPartitioning, child)
+ }
+ }
+
+ case ((child, _), _) => child
+ }
+ }
+
+ // Now that we've performed any necessary shuffles, add sorts to guarantee output orderings:
+ children = children.zip(requiredChildOrderings).map { case (child, requiredOrdering) =>
+ // If child.outputOrdering already satisfies the requiredOrdering, we do not need to sort.
+ if (SortOrder.orderingSatisfies(child.outputOrdering, requiredOrdering)) {
+ child
+ } else {
+ SortExec(requiredOrdering, global = false, child = child)
+ }
+ }
+
+ operator.withNewChildren(children)
+ }
+
+ def apply(plan: SparkPlan): SparkPlan = plan.transformUp {
+ case operator: SparkPlan =>
+ ensureDistributionAndOrdering(operator)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiQueryStagePreparation.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiQueryStagePreparation.scala
new file mode 100644
index 00000000000..a7fcbecd422
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiQueryStagePreparation.scala
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.adaptive.QueryStageExec
+import org.apache.spark.sql.execution.command.{ResetCommand, SetCommand}
+import org.apache.spark.sql.execution.exchange.{BroadcastExchangeLike, ReusedExchangeExec, ShuffleExchangeLike}
+import org.apache.spark.sql.internal.SQLConf
+
+import org.apache.kyuubi.sql.KyuubiSQLConf._
+
+/**
+ * This rule split stage into two parts:
+ * 1. previous stage
+ * 2. final stage
+ * For final stage, we can inject extra config. It's useful if we use repartition to optimize
+ * small files that needs bigger shuffle partition size than previous.
+ *
+ * Let's say we have a query with 3 stages, then the logical machine like:
+ *
+ * Set/Reset Command -> cleanup previousStage config if user set the spark config.
+ * Query -> AQE -> stage1 -> preparation (use previousStage to overwrite spark config)
+ * -> AQE -> stage2 -> preparation (use spark config)
+ * -> AQE -> stage3 -> preparation (use finalStage config to overwrite spark config,
+ * store spark config to previousStage.)
+ *
+ * An example of the new finalStage config:
+ * `spark.sql.adaptive.advisoryPartitionSizeInBytes` ->
+ * `spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes`
+ */
+case class FinalStageConfigIsolation(session: SparkSession) extends Rule[SparkPlan] {
+ import FinalStageConfigIsolation._
+
+ override def apply(plan: SparkPlan): SparkPlan = {
+ // this rule has no meaning without AQE
+ if (!conf.getConf(FINAL_STAGE_CONFIG_ISOLATION) ||
+ !conf.getConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED)) {
+ return plan
+ }
+
+ if (isFinalStage(plan)) {
+ // We can not get the whole plan at query preparation phase to detect if current plan is
+ // for writing, so we depend on a tag which is been injected at post resolution phase.
+ // Note: we should still do clean up previous config for non-final stage to avoid such case:
+ // the first statement is write, but the second statement is query.
+ if (conf.getConf(FINAL_STAGE_CONFIG_ISOLATION_WRITE_ONLY) &&
+ !WriteUtils.isWrite(session, plan)) {
+ return plan
+ }
+
+ // set config for final stage
+ session.conf.getAll.filter(_._1.startsWith(FINAL_STAGE_CONFIG_PREFIX)).foreach {
+ case (k, v) =>
+ val sparkConfigKey = s"spark.sql.${k.substring(FINAL_STAGE_CONFIG_PREFIX.length)}"
+ val previousStageConfigKey =
+ s"$PREVIOUS_STAGE_CONFIG_PREFIX${k.substring(FINAL_STAGE_CONFIG_PREFIX.length)}"
+ // store the previous config only if we have not stored, to avoid some query only
+ // have one stage that will overwrite real config.
+ if (!session.sessionState.conf.contains(previousStageConfigKey)) {
+ val originalValue =
+ if (session.conf.getOption(sparkConfigKey).isDefined) {
+ session.sessionState.conf.getConfString(sparkConfigKey)
+ } else {
+ // the default value of config is None, so we need to use a internal tag
+ INTERNAL_UNSET_CONFIG_TAG
+ }
+ logInfo(s"Store config: $sparkConfigKey to previousStage, " +
+ s"original value: $originalValue ")
+ session.sessionState.conf.setConfString(previousStageConfigKey, originalValue)
+ }
+ logInfo(s"For final stage: set $sparkConfigKey = $v.")
+ session.conf.set(sparkConfigKey, v)
+ }
+ } else {
+ // reset config for previous stage
+ session.conf.getAll.filter(_._1.startsWith(PREVIOUS_STAGE_CONFIG_PREFIX)).foreach {
+ case (k, v) =>
+ val sparkConfigKey = s"spark.sql.${k.substring(PREVIOUS_STAGE_CONFIG_PREFIX.length)}"
+ logInfo(s"For previous stage: set $sparkConfigKey = $v.")
+ if (v == INTERNAL_UNSET_CONFIG_TAG) {
+ session.conf.unset(sparkConfigKey)
+ } else {
+ session.conf.set(sparkConfigKey, v)
+ }
+ // unset config so that we do not need to reset configs for every previous stage
+ session.conf.unset(k)
+ }
+ }
+
+ plan
+ }
+
+ /**
+ * Currently formula depend on AQE in Spark 3.1.1, not sure it can work in future.
+ */
+ private def isFinalStage(plan: SparkPlan): Boolean = {
+ var shuffleNum = 0
+ var broadcastNum = 0
+ var reusedNum = 0
+ var queryStageNum = 0
+
+ def collectNumber(p: SparkPlan): SparkPlan = {
+ p transform {
+ case shuffle: ShuffleExchangeLike =>
+ shuffleNum += 1
+ shuffle
+
+ case broadcast: BroadcastExchangeLike =>
+ broadcastNum += 1
+ broadcast
+
+ case reusedExchangeExec: ReusedExchangeExec =>
+ reusedNum += 1
+ reusedExchangeExec
+
+ // query stage is leaf node so we need to transform it manually
+ // compatible with Spark 3.5:
+ // SPARK-42101: table cache is a independent query stage, so do not need include it.
+ case queryStage: QueryStageExec if queryStage.nodeName != "TableCacheQueryStage" =>
+ queryStageNum += 1
+ collectNumber(queryStage.plan)
+ queryStage
+ }
+ }
+ collectNumber(plan)
+
+ if (shuffleNum == 0) {
+ // we don not care about broadcast stage here since it won't change partition number.
+ true
+ } else if (shuffleNum + broadcastNum + reusedNum == queryStageNum) {
+ true
+ } else {
+ false
+ }
+ }
+}
+object FinalStageConfigIsolation {
+ final val SQL_PREFIX = "spark.sql."
+ final val FINAL_STAGE_CONFIG_PREFIX = "spark.sql.finalStage."
+ final val PREVIOUS_STAGE_CONFIG_PREFIX = "spark.sql.previousStage."
+ final val INTERNAL_UNSET_CONFIG_TAG = "__INTERNAL_UNSET_CONFIG_TAG__"
+
+ def getPreviousStageConfigKey(configKey: String): Option[String] = {
+ if (configKey.startsWith(SQL_PREFIX)) {
+ Some(s"$PREVIOUS_STAGE_CONFIG_PREFIX${configKey.substring(SQL_PREFIX.length)}")
+ } else {
+ None
+ }
+ }
+}
+
+case class FinalStageConfigIsolationCleanRule(session: SparkSession) extends Rule[LogicalPlan] {
+ import FinalStageConfigIsolation._
+
+ override def apply(plan: LogicalPlan): LogicalPlan = plan match {
+ case set @ SetCommand(Some((k, Some(_)))) if k.startsWith(SQL_PREFIX) =>
+ checkAndUnsetPreviousStageConfig(k)
+ set
+
+ case reset @ ResetCommand(Some(k)) if k.startsWith(SQL_PREFIX) =>
+ checkAndUnsetPreviousStageConfig(k)
+ reset
+
+ case other => other
+ }
+
+ private def checkAndUnsetPreviousStageConfig(configKey: String): Unit = {
+ getPreviousStageConfigKey(configKey).foreach { previousStageConfigKey =>
+ if (session.sessionState.conf.contains(previousStageConfigKey)) {
+ logInfo(s"For previous stage: unset $previousStageConfigKey")
+ session.conf.unset(previousStageConfigKey)
+ }
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
new file mode 100644
index 00000000000..6f45dae126e
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.apache.spark.network.util.ByteUnit
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.internal.SQLConf._
+
+object KyuubiSQLConf {
+
+ val INSERT_REPARTITION_BEFORE_WRITE =
+ buildConf("spark.sql.optimizer.insertRepartitionBeforeWrite.enabled")
+ .doc("Add repartition node at the top of query plan. An approach of merging small files.")
+ .version("1.2.0")
+ .booleanConf
+ .createWithDefault(true)
+
+ val INSERT_REPARTITION_NUM =
+ buildConf("spark.sql.optimizer.insertRepartitionNum")
+ .doc(s"The partition number if ${INSERT_REPARTITION_BEFORE_WRITE.key} is enabled. " +
+ s"If AQE is disabled, the default value is ${SQLConf.SHUFFLE_PARTITIONS.key}. " +
+ "If AQE is enabled, the default value is none that means depend on AQE. " +
+ "This config is used for Spark 3.1 only.")
+ .version("1.2.0")
+ .intConf
+ .createOptional
+
+ val DYNAMIC_PARTITION_INSERTION_REPARTITION_NUM =
+ buildConf("spark.sql.optimizer.dynamicPartitionInsertionRepartitionNum")
+ .doc(s"The partition number of each dynamic partition if " +
+ s"${INSERT_REPARTITION_BEFORE_WRITE.key} is enabled. " +
+ "We will repartition by dynamic partition columns to reduce the small file but that " +
+ "can cause data skew. This config is to extend the partition of dynamic " +
+ "partition column to avoid skew but may generate some small files.")
+ .version("1.2.0")
+ .intConf
+ .createWithDefault(100)
+
+ val FORCE_SHUFFLE_BEFORE_JOIN =
+ buildConf("spark.sql.optimizer.forceShuffleBeforeJoin.enabled")
+ .doc("Ensure shuffle node exists before shuffled join (shj and smj) to make AQE " +
+ "`OptimizeSkewedJoin` works (complex scenario join, multi table join).")
+ .version("1.2.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val FINAL_STAGE_CONFIG_ISOLATION =
+ buildConf("spark.sql.optimizer.finalStageConfigIsolation.enabled")
+ .doc("If true, the final stage support use different config with previous stage. " +
+ "The prefix of final stage config key should be `spark.sql.finalStage.`." +
+ "For example, the raw spark config: `spark.sql.adaptive.advisoryPartitionSizeInBytes`, " +
+ "then the final stage config should be: " +
+ "`spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes`.")
+ .version("1.2.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val SQL_CLASSIFICATION = "spark.sql.analyzer.classification"
+ val SQL_CLASSIFICATION_ENABLED =
+ buildConf("spark.sql.analyzer.classification.enabled")
+ .doc("When true, allows Kyuubi engine to judge this SQL's classification " +
+ s"and set `$SQL_CLASSIFICATION` back into sessionConf. " +
+ "Through this configuration item, Spark can optimizing configuration dynamic")
+ .version("1.4.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val INSERT_ZORDER_BEFORE_WRITING =
+ buildConf("spark.sql.optimizer.insertZorderBeforeWriting.enabled")
+ .doc("When true, we will follow target table properties to insert zorder or not. " +
+ "The key properties are: 1) kyuubi.zorder.enabled; if this property is true, we will " +
+ "insert zorder before writing data. 2) kyuubi.zorder.cols; string split by comma, we " +
+ "will zorder by these cols.")
+ .version("1.4.0")
+ .booleanConf
+ .createWithDefault(true)
+
+ val ZORDER_GLOBAL_SORT_ENABLED =
+ buildConf("spark.sql.optimizer.zorderGlobalSort.enabled")
+ .doc("When true, we do a global sort using zorder. Note that, it can cause data skew " +
+ "issue if the zorder columns have less cardinality. When false, we only do local sort " +
+ "using zorder.")
+ .version("1.4.0")
+ .booleanConf
+ .createWithDefault(true)
+
+ val REBALANCE_BEFORE_ZORDER =
+ buildConf("spark.sql.optimizer.rebalanceBeforeZorder.enabled")
+ .doc("When true, we do a rebalance before zorder in case data skew. " +
+ "Note that, if the insertion is dynamic partition we will use the partition " +
+ "columns to rebalance. Note that, this config only affects with Spark 3.3.x")
+ .version("1.6.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val REBALANCE_ZORDER_COLUMNS_ENABLED =
+ buildConf("spark.sql.optimizer.rebalanceZorderColumns.enabled")
+ .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do rebalance before " +
+ s"Z-Order. If it's dynamic partition insert, the rebalance expression will include " +
+ s"both partition columns and Z-Order columns. Note that, this config only " +
+ s"affects with Spark 3.3.x")
+ .version("1.6.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val TWO_PHASE_REBALANCE_BEFORE_ZORDER =
+ buildConf("spark.sql.optimizer.twoPhaseRebalanceBeforeZorder.enabled")
+ .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do two phase rebalance " +
+ s"before Z-Order for the dynamic partition write. The first phase rebalance using " +
+ s"dynamic partition column; The second phase rebalance using dynamic partition column + " +
+ s"Z-Order columns. Note that, this config only affects with Spark 3.3.x")
+ .version("1.6.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val ZORDER_USING_ORIGINAL_ORDERING_ENABLED =
+ buildConf("spark.sql.optimizer.zorderUsingOriginalOrdering.enabled")
+ .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do sort by " +
+ s"the original ordering i.e. lexicographical order. Note that, this config only " +
+ s"affects with Spark 3.3.x")
+ .version("1.6.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val WATCHDOG_MAX_PARTITIONS =
+ buildConf("spark.sql.watchdog.maxPartitions")
+ .doc("Set the max partition number when spark scans a data source. " +
+ "Enable maxPartitions Strategy by specifying this configuration. " +
+ "Add maxPartitions Strategy to avoid scan excessive partitions " +
+ "on partitioned table, it's optional that works with defined")
+ .version("1.4.0")
+ .intConf
+ .createOptional
+
+ val WATCHDOG_MAX_FILE_SIZE =
+ buildConf("spark.sql.watchdog.maxFileSize")
+ .doc("Set the maximum size in bytes of files when spark scans a data source. " +
+ "Enable maxFileSize Strategy by specifying this configuration. " +
+ "Add maxFileSize Strategy to avoid scan excessive size of files," +
+ " it's optional that works with defined")
+ .version("1.8.0")
+ .bytesConf(ByteUnit.BYTE)
+ .createOptional
+
+ val WATCHDOG_FORCED_MAXOUTPUTROWS =
+ buildConf("spark.sql.watchdog.forcedMaxOutputRows")
+ .doc("Add ForcedMaxOutputRows rule to avoid huge output rows of non-limit query " +
+ "unexpectedly, it's optional that works with defined")
+ .version("1.4.0")
+ .intConf
+ .createOptional
+
+ val DROP_IGNORE_NONEXISTENT =
+ buildConf("spark.sql.optimizer.dropIgnoreNonExistent")
+ .doc("Do not report an error if DROP DATABASE/TABLE/VIEW/FUNCTION/PARTITION specifies " +
+ "a non-existent database/table/view/function/partition")
+ .version("1.5.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val INFER_REBALANCE_AND_SORT_ORDERS =
+ buildConf("spark.sql.optimizer.inferRebalanceAndSortOrders.enabled")
+ .doc("When ture, infer columns for rebalance and sort orders from original query, " +
+ "e.g. the join keys from join. It can avoid compression ratio regression.")
+ .version("1.7.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val INFER_REBALANCE_AND_SORT_ORDERS_MAX_COLUMNS =
+ buildConf("spark.sql.optimizer.inferRebalanceAndSortOrdersMaxColumns")
+ .doc("The max columns of inferred columns.")
+ .version("1.7.0")
+ .intConf
+ .checkValue(_ > 0, "must be positive number")
+ .createWithDefault(3)
+
+ val INSERT_REPARTITION_BEFORE_WRITE_IF_NO_SHUFFLE =
+ buildConf("spark.sql.optimizer.insertRepartitionBeforeWriteIfNoShuffle.enabled")
+ .doc("When true, add repartition even if the original plan does not have shuffle.")
+ .version("1.7.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val FINAL_STAGE_CONFIG_ISOLATION_WRITE_ONLY =
+ buildConf("spark.sql.optimizer.finalStageConfigIsolationWriteOnly.enabled")
+ .doc("When true, only enable final stage isolation for writing.")
+ .version("1.7.0")
+ .booleanConf
+ .createWithDefault(true)
+
+ val FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_ENABLED =
+ buildConf("spark.sql.finalWriteStage.eagerlyKillExecutors.enabled")
+ .doc("When true, eagerly kill redundant executors before running final write stage.")
+ .version("1.8.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_KILL_ALL =
+ buildConf("spark.sql.finalWriteStage.eagerlyKillExecutors.killAll")
+ .doc("When true, eagerly kill all executors before running final write stage. " +
+ "Mainly for test.")
+ .version("1.8.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val FINAL_WRITE_STAGE_SKIP_KILLING_EXECUTORS_FOR_TABLE_CACHE =
+ buildConf("spark.sql.finalWriteStage.skipKillingExecutorsForTableCache")
+ .doc("When true, skip killing executors if the plan has table caches.")
+ .version("1.8.0")
+ .booleanConf
+ .createWithDefault(true)
+
+ val FINAL_WRITE_STAGE_PARTITION_FACTOR =
+ buildConf("spark.sql.finalWriteStage.retainExecutorsFactor")
+ .doc("If the target executors * factor < active executors, and " +
+ "target executors * factor > min executors, then kill redundant executors.")
+ .version("1.8.0")
+ .doubleConf
+ .checkValue(_ >= 1, "must be bigger than or equal to 1")
+ .createWithDefault(1.2)
+
+ val FINAL_WRITE_STAGE_RESOURCE_ISOLATION_ENABLED =
+ buildConf("spark.sql.finalWriteStage.resourceIsolation.enabled")
+ .doc(
+ "When true, make final write stage resource isolation using custom RDD resource profile.")
+ .version("1.8.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val FINAL_WRITE_STAGE_EXECUTOR_CORES =
+ buildConf("spark.sql.finalWriteStage.executorCores")
+ .doc("Specify the executor core request for final write stage. " +
+ "It would be passed to the RDD resource profile.")
+ .version("1.8.0")
+ .intConf
+ .createOptional
+
+ val FINAL_WRITE_STAGE_EXECUTOR_MEMORY =
+ buildConf("spark.sql.finalWriteStage.executorMemory")
+ .doc("Specify the executor on heap memory request for final write stage. " +
+ "It would be passed to the RDD resource profile.")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
+ val FINAL_WRITE_STAGE_EXECUTOR_MEMORY_OVERHEAD =
+ buildConf("spark.sql.finalWriteStage.executorMemoryOverhead")
+ .doc("Specify the executor memory overhead request for final write stage. " +
+ "It would be passed to the RDD resource profile.")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
+ val FINAL_WRITE_STAGE_EXECUTOR_OFF_HEAP_MEMORY =
+ buildConf("spark.sql.finalWriteStage.executorOffHeapMemory")
+ .doc("Specify the executor off heap memory request for final write stage. " +
+ "It would be passed to the RDD resource profile.")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLExtensionException.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLExtensionException.scala
new file mode 100644
index 00000000000..88c5a988fd9
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLExtensionException.scala
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import java.sql.SQLException
+
+class KyuubiSQLExtensionException(reason: String, cause: Throwable)
+ extends SQLException(reason, cause) {
+
+ def this(reason: String) = {
+ this(reason, null)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLAstBuilder.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLAstBuilder.scala
new file mode 100644
index 00000000000..cc00bf88e94
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLAstBuilder.scala
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import scala.collection.JavaConverters.asScalaBufferConverter
+import scala.collection.mutable.ListBuffer
+
+import org.antlr.v4.runtime.ParserRuleContext
+import org.antlr.v4.runtime.misc.Interval
+import org.antlr.v4.runtime.tree.ParseTree
+import org.apache.spark.sql.catalyst.SQLConfHelper
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, UnresolvedRelation, UnresolvedStar}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.parser.ParserUtils.withOrigin
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project, Sort}
+
+import org.apache.kyuubi.sql.KyuubiSparkSQLParser._
+import org.apache.kyuubi.sql.zorder.{OptimizeZorderStatement, Zorder}
+
+class KyuubiSparkSQLAstBuilder extends KyuubiSparkSQLBaseVisitor[AnyRef] with SQLConfHelper {
+
+ def buildOptimizeStatement(
+ unparsedPredicateOptimize: UnparsedPredicateOptimize,
+ parseExpression: String => Expression): LogicalPlan = {
+
+ val UnparsedPredicateOptimize(tableIdent, tablePredicate, orderExpr) =
+ unparsedPredicateOptimize
+
+ val predicate = tablePredicate.map(parseExpression)
+ verifyPartitionPredicates(predicate)
+ val table = UnresolvedRelation(tableIdent)
+ val tableWithFilter = predicate match {
+ case Some(expr) => Filter(expr, table)
+ case None => table
+ }
+ val query =
+ Sort(
+ SortOrder(orderExpr, Ascending, NullsLast, Seq.empty) :: Nil,
+ conf.getConf(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED),
+ Project(Seq(UnresolvedStar(None)), tableWithFilter))
+ OptimizeZorderStatement(tableIdent, query)
+ }
+
+ private def verifyPartitionPredicates(predicates: Option[Expression]): Unit = {
+ predicates.foreach {
+ case p if !isLikelySelective(p) =>
+ throw new KyuubiSQLExtensionException(s"unsupported partition predicates: ${p.sql}")
+ case _ =>
+ }
+ }
+
+ /**
+ * Forked from Apache Spark's org.apache.spark.sql.catalyst.expressions.PredicateHelper
+ * The `PredicateHelper.isLikelySelective()` is available since Spark-3.3, forked for Spark
+ * that is lower than 3.3.
+ *
+ * Returns whether an expression is likely to be selective
+ */
+ private def isLikelySelective(e: Expression): Boolean = e match {
+ case Not(expr) => isLikelySelective(expr)
+ case And(l, r) => isLikelySelective(l) || isLikelySelective(r)
+ case Or(l, r) => isLikelySelective(l) && isLikelySelective(r)
+ case _: StringRegexExpression => true
+ case _: BinaryComparison => true
+ case _: In | _: InSet => true
+ case _: StringPredicate => true
+ case BinaryPredicate(_) => true
+ case _: MultiLikeBase => true
+ case _ => false
+ }
+
+ private object BinaryPredicate {
+ def unapply(expr: Expression): Option[Expression] = expr match {
+ case _: Contains => Option(expr)
+ case _: StartsWith => Option(expr)
+ case _: EndsWith => Option(expr)
+ case _ => None
+ }
+ }
+
+ /**
+ * Create an expression from the given context. This method just passes the context on to the
+ * visitor and only takes care of typing (We assume that the visitor returns an Expression here).
+ */
+ protected def expression(ctx: ParserRuleContext): Expression = typedVisit(ctx)
+
+ protected def multiPart(ctx: ParserRuleContext): Seq[String] = typedVisit(ctx)
+
+ override def visitSingleStatement(ctx: SingleStatementContext): LogicalPlan = {
+ visit(ctx.statement()).asInstanceOf[LogicalPlan]
+ }
+
+ override def visitOptimizeZorder(
+ ctx: OptimizeZorderContext): UnparsedPredicateOptimize = withOrigin(ctx) {
+ val tableIdent = multiPart(ctx.multipartIdentifier())
+
+ val predicate = Option(ctx.whereClause())
+ .map(_.partitionPredicate)
+ .map(extractRawText(_))
+
+ val zorderCols = ctx.zorderClause().order.asScala
+ .map(visitMultipartIdentifier)
+ .map(UnresolvedAttribute(_))
+ .toSeq
+
+ val orderExpr =
+ if (zorderCols.length == 1) {
+ zorderCols.head
+ } else {
+ Zorder(zorderCols)
+ }
+ UnparsedPredicateOptimize(tableIdent, predicate, orderExpr)
+ }
+
+ override def visitPassThrough(ctx: PassThroughContext): LogicalPlan = null
+
+ override def visitMultipartIdentifier(ctx: MultipartIdentifierContext): Seq[String] =
+ withOrigin(ctx) {
+ ctx.parts.asScala.map(_.getText).toSeq
+ }
+
+ override def visitZorderClause(ctx: ZorderClauseContext): Seq[UnresolvedAttribute] =
+ withOrigin(ctx) {
+ val res = ListBuffer[UnresolvedAttribute]()
+ ctx.multipartIdentifier().forEach { identifier =>
+ res += UnresolvedAttribute(identifier.parts.asScala.map(_.getText).toSeq)
+ }
+ res.toSeq
+ }
+
+ private def typedVisit[T](ctx: ParseTree): T = {
+ ctx.accept(this).asInstanceOf[T]
+ }
+
+ private def extractRawText(exprContext: ParserRuleContext): String = {
+ // Extract the raw expression which will be parsed later
+ exprContext.getStart.getInputStream.getText(new Interval(
+ exprContext.getStart.getStartIndex,
+ exprContext.getStop.getStopIndex))
+ }
+}
+
+/**
+ * a logical plan contains an unparsed expression that will be parsed by spark.
+ */
+trait UnparsedExpressionLogicalPlan extends LogicalPlan {
+ override def output: Seq[Attribute] = throw new UnsupportedOperationException()
+
+ override def children: Seq[LogicalPlan] = throw new UnsupportedOperationException()
+
+ protected def withNewChildrenInternal(
+ newChildren: IndexedSeq[LogicalPlan]): LogicalPlan =
+ throw new UnsupportedOperationException()
+}
+
+case class UnparsedPredicateOptimize(
+ tableIdent: Seq[String],
+ tablePredicate: Option[String],
+ orderExpr: Expression) extends UnparsedExpressionLogicalPlan {}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLCommonExtension.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLCommonExtension.scala
new file mode 100644
index 00000000000..f39ad3cc390
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLCommonExtension.scala
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.apache.spark.sql.SparkSessionExtensions
+
+import org.apache.kyuubi.sql.zorder.{InsertZorderBeforeWritingDatasource33, InsertZorderBeforeWritingHive33, ResolveZorder}
+
+class KyuubiSparkSQLCommonExtension extends (SparkSessionExtensions => Unit) {
+ override def apply(extensions: SparkSessionExtensions): Unit = {
+ KyuubiSparkSQLCommonExtension.injectCommonExtensions(extensions)
+ }
+}
+
+object KyuubiSparkSQLCommonExtension {
+ def injectCommonExtensions(extensions: SparkSessionExtensions): Unit = {
+ // inject zorder parser and related rules
+ extensions.injectParser { case (_, parser) => new SparkKyuubiSparkSQLParser(parser) }
+ extensions.injectResolutionRule(ResolveZorder)
+
+ // Note that:
+ // InsertZorderBeforeWritingDatasource and InsertZorderBeforeWritingHive
+ // should be applied before
+ // RepartitionBeforeWriting and RebalanceBeforeWriting
+ // because we can only apply one of them (i.e. Global Sort or Repartition/Rebalance)
+ extensions.injectPostHocResolutionRule(InsertZorderBeforeWritingDatasource33)
+ extensions.injectPostHocResolutionRule(InsertZorderBeforeWritingHive33)
+ extensions.injectPostHocResolutionRule(FinalStageConfigIsolationCleanRule)
+
+ extensions.injectQueryStagePrepRule(_ => InsertShuffleNodeBeforeJoin)
+
+ extensions.injectQueryStagePrepRule(FinalStageConfigIsolation(_))
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
new file mode 100644
index 00000000000..792315d897a
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLExtension.scala
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.apache.spark.sql.{FinalStageResourceManager, InjectCustomResourceProfile, SparkSessionExtensions}
+
+import org.apache.kyuubi.sql.watchdog.{ForcedMaxOutputRowsRule, MaxScanStrategy}
+
+// scalastyle:off line.size.limit
+/**
+ * Depend on Spark SQL Extension framework, we can use this extension follow steps
+ * 1. move this jar into $SPARK_HOME/jars
+ * 2. add config into `spark-defaults.conf`: `spark.sql.extensions=org.apache.kyuubi.sql.KyuubiSparkSQLExtension`
+ */
+// scalastyle:on line.size.limit
+class KyuubiSparkSQLExtension extends (SparkSessionExtensions => Unit) {
+ override def apply(extensions: SparkSessionExtensions): Unit = {
+ KyuubiSparkSQLCommonExtension.injectCommonExtensions(extensions)
+
+ extensions.injectPostHocResolutionRule(RebalanceBeforeWritingDatasource)
+ extensions.injectPostHocResolutionRule(RebalanceBeforeWritingHive)
+ extensions.injectPostHocResolutionRule(DropIgnoreNonexistent)
+
+ // watchdog extension
+ extensions.injectOptimizerRule(ForcedMaxOutputRowsRule)
+ extensions.injectPlannerStrategy(MaxScanStrategy)
+
+ extensions.injectQueryStagePrepRule(FinalStageResourceManager(_))
+ extensions.injectQueryStagePrepRule(InjectCustomResourceProfile)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
new file mode 100644
index 00000000000..c4418c33c44
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLParser.scala
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.antlr.v4.runtime._
+import org.antlr.v4.runtime.atn.PredictionMode
+import org.antlr.v4.runtime.misc.{Interval, ParseCancellationException}
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.{FunctionIdentifier, SQLConfHelper, TableIdentifier}
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.parser.{ParseErrorListener, ParseException, ParserInterface, PostProcessor}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.trees.Origin
+import org.apache.spark.sql.types.{DataType, StructType}
+
+abstract class KyuubiSparkSQLParserBase extends ParserInterface with SQLConfHelper {
+ def delegate: ParserInterface
+ def astBuilder: KyuubiSparkSQLAstBuilder
+
+ override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>
+ astBuilder.visit(parser.singleStatement()) match {
+ case optimize: UnparsedPredicateOptimize =>
+ astBuilder.buildOptimizeStatement(optimize, delegate.parseExpression)
+ case plan: LogicalPlan => plan
+ case _ => delegate.parsePlan(sqlText)
+ }
+ }
+
+ protected def parse[T](command: String)(toResult: KyuubiSparkSQLParser => T): T = {
+ val lexer = new KyuubiSparkSQLLexer(
+ new UpperCaseCharStream(CharStreams.fromString(command)))
+ lexer.removeErrorListeners()
+ lexer.addErrorListener(ParseErrorListener)
+
+ val tokenStream = new CommonTokenStream(lexer)
+ val parser = new KyuubiSparkSQLParser(tokenStream)
+ parser.addParseListener(PostProcessor)
+ parser.removeErrorListeners()
+ parser.addErrorListener(ParseErrorListener)
+
+ try {
+ try {
+ // first, try parsing with potentially faster SLL mode
+ parser.getInterpreter.setPredictionMode(PredictionMode.SLL)
+ toResult(parser)
+ } catch {
+ case _: ParseCancellationException =>
+ // if we fail, parse with LL mode
+ tokenStream.seek(0) // rewind input stream
+ parser.reset()
+
+ // Try Again.
+ parser.getInterpreter.setPredictionMode(PredictionMode.LL)
+ toResult(parser)
+ }
+ } catch {
+ case e: ParseException if e.command.isDefined =>
+ throw e
+ case e: ParseException =>
+ throw e.withCommand(command)
+ case e: AnalysisException =>
+ val position = Origin(e.line, e.startPosition)
+ throw new ParseException(Option(command), e.message, position, position)
+ }
+ }
+
+ override def parseExpression(sqlText: String): Expression = {
+ delegate.parseExpression(sqlText)
+ }
+
+ override def parseTableIdentifier(sqlText: String): TableIdentifier = {
+ delegate.parseTableIdentifier(sqlText)
+ }
+
+ override def parseFunctionIdentifier(sqlText: String): FunctionIdentifier = {
+ delegate.parseFunctionIdentifier(sqlText)
+ }
+
+ override def parseMultipartIdentifier(sqlText: String): Seq[String] = {
+ delegate.parseMultipartIdentifier(sqlText)
+ }
+
+ override def parseTableSchema(sqlText: String): StructType = {
+ delegate.parseTableSchema(sqlText)
+ }
+
+ override def parseDataType(sqlText: String): DataType = {
+ delegate.parseDataType(sqlText)
+ }
+
+ /**
+ * This functions was introduced since spark-3.3, for more details, please see
+ * https://github.com/apache/spark/pull/34543
+ */
+ override def parseQuery(sqlText: String): LogicalPlan = {
+ delegate.parseQuery(sqlText)
+ }
+}
+
+class SparkKyuubiSparkSQLParser(
+ override val delegate: ParserInterface)
+ extends KyuubiSparkSQLParserBase {
+ def astBuilder: KyuubiSparkSQLAstBuilder = new KyuubiSparkSQLAstBuilder
+}
+
+/* Copied from Apache Spark's to avoid dependency on Spark Internals */
+class UpperCaseCharStream(wrapped: CodePointCharStream) extends CharStream {
+ override def consume(): Unit = wrapped.consume()
+ override def getSourceName(): String = wrapped.getSourceName
+ override def index(): Int = wrapped.index
+ override def mark(): Int = wrapped.mark
+ override def release(marker: Int): Unit = wrapped.release(marker)
+ override def seek(where: Int): Unit = wrapped.seek(where)
+ override def size(): Int = wrapped.size
+
+ override def getText(interval: Interval): String = wrapped.getText(interval)
+
+ // scalastyle:off
+ override def LA(i: Int): Int = {
+ val la = wrapped.LA(i)
+ if (la == 0 || la == IntStream.EOF) la
+ else Character.toUpperCase(la)
+ }
+ // scalastyle:on
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/RebalanceBeforeWriting.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/RebalanceBeforeWriting.scala
new file mode 100644
index 00000000000..3cbacdd2f03
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/RebalanceBeforeWriting.scala
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.expressions.{Ascending, Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.logical._
+
+trait RepartitionBuilderWithRebalance extends RepartitionBuilder {
+ override def buildRepartition(
+ dynamicPartitionColumns: Seq[Attribute],
+ query: LogicalPlan): LogicalPlan = {
+ if (!conf.getConf(KyuubiSQLConf.INFER_REBALANCE_AND_SORT_ORDERS) ||
+ dynamicPartitionColumns.nonEmpty) {
+ RebalancePartitions(dynamicPartitionColumns, query)
+ } else {
+ val maxColumns = conf.getConf(KyuubiSQLConf.INFER_REBALANCE_AND_SORT_ORDERS_MAX_COLUMNS)
+ val inferred = InferRebalanceAndSortOrders.infer(query)
+ if (inferred.isDefined) {
+ val (partitioning, ordering) = inferred.get
+ val rebalance = RebalancePartitions(partitioning.take(maxColumns), query)
+ if (ordering.nonEmpty) {
+ val sortOrders = ordering.take(maxColumns).map(o => SortOrder(o, Ascending))
+ Sort(sortOrders, false, rebalance)
+ } else {
+ rebalance
+ }
+ } else {
+ RebalancePartitions(dynamicPartitionColumns, query)
+ }
+ }
+ }
+
+ override def canInsertRepartitionByExpression(plan: LogicalPlan): Boolean = {
+ super.canInsertRepartitionByExpression(plan) && {
+ plan match {
+ case _: RebalancePartitions => false
+ case _ => true
+ }
+ }
+ }
+}
+
+/**
+ * For datasource table, there two commands can write data to table
+ * 1. InsertIntoHadoopFsRelationCommand
+ * 2. CreateDataSourceTableAsSelectCommand
+ * This rule add a RebalancePartitions node between write and query
+ */
+case class RebalanceBeforeWritingDatasource(session: SparkSession)
+ extends RepartitionBeforeWritingDatasourceBase
+ with RepartitionBuilderWithRebalance {}
+
+/**
+ * For Hive table, there two commands can write data to table
+ * 1. InsertIntoHiveTable
+ * 2. CreateHiveTableAsSelectCommand
+ * This rule add a RebalancePartitions node between write and query
+ */
+case class RebalanceBeforeWritingHive(session: SparkSession)
+ extends RepartitionBeforeWritingHiveBase
+ with RepartitionBuilderWithRebalance {}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/RepartitionBeforeWritingBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/RepartitionBeforeWritingBase.scala
new file mode 100644
index 00000000000..3ebb9740f5f
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/RepartitionBeforeWritingBase.scala
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
+import org.apache.spark.sql.hive.execution.InsertIntoHiveTable
+import org.apache.spark.sql.internal.StaticSQLConf
+
+trait RepartitionBuilder extends Rule[LogicalPlan] with RepartitionBeforeWriteHelper {
+ def buildRepartition(
+ dynamicPartitionColumns: Seq[Attribute],
+ query: LogicalPlan): LogicalPlan
+}
+
+/**
+ * For datasource table, there two commands can write data to table
+ * 1. InsertIntoHadoopFsRelationCommand
+ * 2. CreateDataSourceTableAsSelectCommand
+ * This rule add a repartition node between write and query
+ */
+abstract class RepartitionBeforeWritingDatasourceBase extends RepartitionBuilder {
+
+ override def apply(plan: LogicalPlan): LogicalPlan = {
+ if (conf.getConf(KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE)) {
+ addRepartition(plan)
+ } else {
+ plan
+ }
+ }
+
+ private def addRepartition(plan: LogicalPlan): LogicalPlan = plan match {
+ case i @ InsertIntoHadoopFsRelationCommand(_, sp, _, pc, bucket, _, _, query, _, _, _, _)
+ if query.resolved && bucket.isEmpty && canInsertRepartitionByExpression(query) =>
+ val dynamicPartitionColumns = pc.filterNot(attr => sp.contains(attr.name))
+ i.copy(query = buildRepartition(dynamicPartitionColumns, query))
+
+ case u @ Union(children, _, _) =>
+ u.copy(children = children.map(addRepartition))
+
+ case _ => plan
+ }
+}
+
+/**
+ * For Hive table, there two commands can write data to table
+ * 1. InsertIntoHiveTable
+ * 2. CreateHiveTableAsSelectCommand
+ * This rule add a repartition node between write and query
+ */
+abstract class RepartitionBeforeWritingHiveBase extends RepartitionBuilder {
+ override def apply(plan: LogicalPlan): LogicalPlan = {
+ if (conf.getConf(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive" &&
+ conf.getConf(KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE)) {
+ addRepartition(plan)
+ } else {
+ plan
+ }
+ }
+
+ def addRepartition(plan: LogicalPlan): LogicalPlan = plan match {
+ case i @ InsertIntoHiveTable(table, partition, query, _, _, _, _, _, _, _, _)
+ if query.resolved && table.bucketSpec.isEmpty && canInsertRepartitionByExpression(query) =>
+ val dynamicPartitionColumns = partition.filter(_._2.isEmpty).keys
+ .flatMap(name => query.output.find(_.name == name)).toSeq
+ i.copy(query = buildRepartition(dynamicPartitionColumns, query))
+
+ case u @ Union(children, _, _) =>
+ u.copy(children = children.map(addRepartition))
+
+ case _ => plan
+ }
+}
+
+trait RepartitionBeforeWriteHelper extends Rule[LogicalPlan] {
+ private def hasBenefit(plan: LogicalPlan): Boolean = {
+ def probablyHasShuffle: Boolean = plan.find {
+ case _: Join => true
+ case _: Aggregate => true
+ case _: Distinct => true
+ case _: Deduplicate => true
+ case _: Window => true
+ case s: Sort if s.global => true
+ case _: RepartitionOperation => true
+ case _: GlobalLimit => true
+ case _ => false
+ }.isDefined
+
+ conf.getConf(KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE_IF_NO_SHUFFLE) || probablyHasShuffle
+ }
+
+ def canInsertRepartitionByExpression(plan: LogicalPlan): Boolean = {
+ def canInsert(p: LogicalPlan): Boolean = p match {
+ case Project(_, child) => canInsert(child)
+ case SubqueryAlias(_, child) => canInsert(child)
+ case Limit(_, _) => false
+ case _: Sort => false
+ case _: RepartitionByExpression => false
+ case _: Repartition => false
+ case _ => true
+ }
+
+ // 1. make sure AQE is enabled, otherwise it is no meaning to add a shuffle
+ // 2. make sure it does not break the semantics of original plan
+ // 3. try to avoid adding a shuffle if it has potential performance regression
+ conf.adaptiveExecutionEnabled && canInsert(plan) && hasBenefit(plan)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/WriteUtils.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/WriteUtils.scala
new file mode 100644
index 00000000000..89dd8319480
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/WriteUtils.scala
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.{SparkPlan, UnionExec}
+import org.apache.spark.sql.execution.command.DataWritingCommandExec
+import org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec
+
+object WriteUtils {
+ def isWrite(session: SparkSession, plan: SparkPlan): Boolean = {
+ plan match {
+ case _: DataWritingCommandExec => true
+ case _: V2TableWriteExec => true
+ case u: UnionExec if u.children.nonEmpty => u.children.forall(isWrite(session, _))
+ case _ => false
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/ForcedMaxOutputRowsBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/ForcedMaxOutputRowsBase.scala
new file mode 100644
index 00000000000..4f897d1b600
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/ForcedMaxOutputRowsBase.scala
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.watchdog
+
+import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.command.DataWritingCommand
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+
+/*
+ * Add ForcedMaxOutputRows rule for output rows limitation
+ * to avoid huge output rows of non_limit query unexpectedly
+ * mainly applied to cases as below:
+ *
+ * case 1:
+ * {{{
+ * SELECT [c1, c2, ...]
+ * }}}
+ *
+ * case 2:
+ * {{{
+ * WITH CTE AS (
+ * ...)
+ * SELECT [c1, c2, ...] FROM CTE ...
+ * }}}
+ *
+ * The Logical Rule add a GlobalLimit node before root project
+ * */
+trait ForcedMaxOutputRowsBase extends Rule[LogicalPlan] {
+
+ protected def isChildAggregate(a: Aggregate): Boolean
+
+ protected def canInsertLimitInner(p: LogicalPlan): Boolean = p match {
+ case Aggregate(_, Alias(_, "havingCondition") :: Nil, _) => false
+ case agg: Aggregate => !isChildAggregate(agg)
+ case _: RepartitionByExpression => true
+ case _: Distinct => true
+ case _: Filter => true
+ case _: Project => true
+ case Limit(_, _) => true
+ case _: Sort => true
+ case Union(children, _, _) =>
+ if (children.exists(_.isInstanceOf[DataWritingCommand])) {
+ false
+ } else {
+ true
+ }
+ case _: MultiInstanceRelation => true
+ case _: Join => true
+ case _ => false
+ }
+
+ protected def canInsertLimit(p: LogicalPlan, maxOutputRowsOpt: Option[Int]): Boolean = {
+ maxOutputRowsOpt match {
+ case Some(forcedMaxOutputRows) => canInsertLimitInner(p) &&
+ !p.maxRows.exists(_ <= forcedMaxOutputRows)
+ case None => false
+ }
+ }
+
+ override def apply(plan: LogicalPlan): LogicalPlan = {
+ val maxOutputRowsOpt = conf.getConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS)
+ plan match {
+ case p if p.resolved && canInsertLimit(p, maxOutputRowsOpt) =>
+ Limit(
+ maxOutputRowsOpt.get,
+ plan)
+ case _ => plan
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/ForcedMaxOutputRowsRule.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/ForcedMaxOutputRowsRule.scala
new file mode 100644
index 00000000000..a3d990b1098
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/ForcedMaxOutputRowsRule.scala
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.watchdog
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, CommandResult, LogicalPlan, Union, WithCTE}
+import org.apache.spark.sql.execution.command.DataWritingCommand
+
+case class ForcedMaxOutputRowsRule(sparkSession: SparkSession) extends ForcedMaxOutputRowsBase {
+
+ override protected def isChildAggregate(a: Aggregate): Boolean = false
+
+ override protected def canInsertLimitInner(p: LogicalPlan): Boolean = p match {
+ case WithCTE(plan, _) => this.canInsertLimitInner(plan)
+ case plan: LogicalPlan => plan match {
+ case Union(children, _, _) => !children.exists {
+ case _: DataWritingCommand => true
+ case p: CommandResult if p.commandLogicalPlan.isInstanceOf[DataWritingCommand] => true
+ case _ => false
+ }
+ case _ => super.canInsertLimitInner(plan)
+ }
+ }
+
+ override protected def canInsertLimit(p: LogicalPlan, maxOutputRowsOpt: Option[Int]): Boolean = {
+ p match {
+ case WithCTE(plan, _) => this.canInsertLimit(plan, maxOutputRowsOpt)
+ case _ => super.canInsertLimit(p, maxOutputRowsOpt)
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/KyuubiWatchDogException.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/KyuubiWatchDogException.scala
new file mode 100644
index 00000000000..e44309192a9
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/KyuubiWatchDogException.scala
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.watchdog
+
+import org.apache.kyuubi.sql.KyuubiSQLExtensionException
+
+final class MaxPartitionExceedException(
+ private val reason: String = "",
+ private val cause: Throwable = None.orNull)
+ extends KyuubiSQLExtensionException(reason, cause)
+
+final class MaxFileSizeExceedException(
+ private val reason: String = "",
+ private val cause: Throwable = None.orNull)
+ extends KyuubiSQLExtensionException(reason, cause)
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
new file mode 100644
index 00000000000..1ed55ebc2fd
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
@@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.watchdog
+
+import org.apache.hadoop.fs.Path
+import org.apache.spark.sql.{PruneFileSourcePartitionHelper, SparkSession, Strategy}
+import org.apache.spark.sql.catalyst.SQLConfHelper
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, HiveTableRelation}
+import org.apache.spark.sql.catalyst.planning.ScanOperation
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.{CatalogFileIndex, HadoopFsRelation, InMemoryFileIndex, LogicalRelation}
+import org.apache.spark.sql.types.StructType
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+
+/**
+ * Add MaxScanStrategy to avoid scan excessive partitions or files
+ * 1. Check if scan exceed maxPartition of partitioned table
+ * 2. Check if scan exceed maxFileSize (calculated by hive table and partition statistics)
+ * This Strategy Add Planner Strategy after LogicalOptimizer
+ * @param session
+ */
+case class MaxScanStrategy(session: SparkSession)
+ extends Strategy
+ with SQLConfHelper
+ with PruneFileSourcePartitionHelper {
+ override def apply(plan: LogicalPlan): Seq[SparkPlan] = {
+ val maxScanPartitionsOpt = conf.getConf(KyuubiSQLConf.WATCHDOG_MAX_PARTITIONS)
+ val maxFileSizeOpt = conf.getConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE)
+ if (maxScanPartitionsOpt.isDefined || maxFileSizeOpt.isDefined) {
+ checkScan(plan, maxScanPartitionsOpt, maxFileSizeOpt)
+ }
+ Nil
+ }
+
+ private def checkScan(
+ plan: LogicalPlan,
+ maxScanPartitionsOpt: Option[Int],
+ maxFileSizeOpt: Option[Long]): Unit = {
+ plan match {
+ case ScanOperation(_, _, _, relation: HiveTableRelation) =>
+ if (relation.isPartitioned) {
+ relation.prunedPartitions match {
+ case Some(prunedPartitions) =>
+ if (maxScanPartitionsOpt.exists(_ < prunedPartitions.size)) {
+ throw new MaxPartitionExceedException(
+ s"""
+ |SQL job scan hive partition: ${prunedPartitions.size}
+ |exceed restrict of hive scan maxPartition ${maxScanPartitionsOpt.get}
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${relation.tableMeta.qualifiedName}
+ |Owner: ${relation.tableMeta.owner}
+ |Partition Structure: ${relation.partitionCols.map(_.name).mkString(", ")}
+ |""".stripMargin)
+ }
+ lazy val scanFileSize = prunedPartitions.flatMap(_.stats).map(_.sizeInBytes).sum
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw partTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ Some(relation.tableMeta),
+ prunedPartitions.flatMap(_.storage.locationUri).map(_.toString),
+ relation.partitionCols.map(_.name))
+ }
+ case _ =>
+ lazy val scanPartitions: Int = session
+ .sessionState.catalog.externalCatalog.listPartitionNames(
+ relation.tableMeta.database,
+ relation.tableMeta.identifier.table).size
+ if (maxScanPartitionsOpt.exists(_ < scanPartitions)) {
+ throw new MaxPartitionExceedException(
+ s"""
+ |Your SQL job scan a whole huge table without any partition filter,
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${relation.tableMeta.qualifiedName}
+ |Owner: ${relation.tableMeta.owner}
+ |Partition Structure: ${relation.partitionCols.map(_.name).mkString(", ")}
+ |""".stripMargin)
+ }
+
+ lazy val scanFileSize: BigInt =
+ relation.tableMeta.stats.map(_.sizeInBytes).getOrElse {
+ session
+ .sessionState.catalog.externalCatalog.listPartitions(
+ relation.tableMeta.database,
+ relation.tableMeta.identifier.table).flatMap(_.stats).map(_.sizeInBytes).sum
+ }
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw new MaxFileSizeExceedException(
+ s"""
+ |Your SQL job scan a whole huge table without any partition filter,
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${relation.tableMeta.qualifiedName}
+ |Owner: ${relation.tableMeta.owner}
+ |Partition Structure: ${relation.partitionCols.map(_.name).mkString(", ")}
+ |""".stripMargin)
+ }
+ }
+ } else {
+ lazy val scanFileSize = relation.tableMeta.stats.map(_.sizeInBytes).sum
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw nonPartTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ Some(relation.tableMeta))
+ }
+ }
+ case ScanOperation(
+ _,
+ _,
+ filters,
+ relation @ LogicalRelation(
+ fsRelation @ HadoopFsRelation(
+ fileIndex: InMemoryFileIndex,
+ partitionSchema,
+ _,
+ _,
+ _,
+ _),
+ _,
+ _,
+ _)) =>
+ if (fsRelation.partitionSchema.nonEmpty) {
+ val (partitionKeyFilters, dataFilter) =
+ getPartitionKeyFiltersAndDataFilters(
+ SparkSession.active,
+ relation,
+ partitionSchema,
+ filters,
+ relation.output)
+ val prunedPartitions = fileIndex.listFiles(
+ partitionKeyFilters.toSeq,
+ dataFilter)
+ if (maxScanPartitionsOpt.exists(_ < prunedPartitions.size)) {
+ throw maxPartitionExceedError(
+ prunedPartitions.size,
+ maxScanPartitionsOpt.get,
+ relation.catalogTable,
+ fileIndex.rootPaths,
+ fsRelation.partitionSchema)
+ }
+ lazy val scanFileSize = prunedPartitions.flatMap(_.files).map(_.getLen).sum
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw partTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ relation.catalogTable,
+ fileIndex.rootPaths.map(_.toString),
+ fsRelation.partitionSchema.map(_.name))
+ }
+ } else {
+ lazy val scanFileSize = fileIndex.sizeInBytes
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw nonPartTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ relation.catalogTable)
+ }
+ }
+ case ScanOperation(
+ _,
+ _,
+ filters,
+ logicalRelation @ LogicalRelation(
+ fsRelation @ HadoopFsRelation(
+ catalogFileIndex: CatalogFileIndex,
+ partitionSchema,
+ _,
+ _,
+ _,
+ _),
+ _,
+ _,
+ _)) =>
+ if (fsRelation.partitionSchema.nonEmpty) {
+ val (partitionKeyFilters, _) =
+ getPartitionKeyFiltersAndDataFilters(
+ SparkSession.active,
+ logicalRelation,
+ partitionSchema,
+ filters,
+ logicalRelation.output)
+
+ val fileIndex = catalogFileIndex.filterPartitions(
+ partitionKeyFilters.toSeq)
+
+ lazy val prunedPartitionSize = fileIndex.partitionSpec().partitions.size
+ if (maxScanPartitionsOpt.exists(_ < prunedPartitionSize)) {
+ throw maxPartitionExceedError(
+ prunedPartitionSize,
+ maxScanPartitionsOpt.get,
+ logicalRelation.catalogTable,
+ catalogFileIndex.rootPaths,
+ fsRelation.partitionSchema)
+ }
+
+ lazy val scanFileSize = fileIndex
+ .listFiles(Nil, Nil).flatMap(_.files).map(_.getLen).sum
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw partTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ logicalRelation.catalogTable,
+ catalogFileIndex.rootPaths.map(_.toString),
+ fsRelation.partitionSchema.map(_.name))
+ }
+ } else {
+ lazy val scanFileSize = catalogFileIndex.sizeInBytes
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw nonPartTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ logicalRelation.catalogTable)
+ }
+ }
+ case _ =>
+ }
+ }
+
+ def maxPartitionExceedError(
+ prunedPartitionSize: Int,
+ maxPartitionSize: Int,
+ tableMeta: Option[CatalogTable],
+ rootPaths: Seq[Path],
+ partitionSchema: StructType): Throwable = {
+ val truncatedPaths =
+ if (rootPaths.length > 5) {
+ rootPaths.slice(0, 5).mkString(",") + """... """ + (rootPaths.length - 5) + " more paths"
+ } else {
+ rootPaths.mkString(",")
+ }
+
+ new MaxPartitionExceedException(
+ s"""
+ |SQL job scan data source partition: $prunedPartitionSize
+ |exceed restrict of data source scan maxPartition $maxPartitionSize
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${tableMeta.map(_.qualifiedName).getOrElse("")}
+ |Owner: ${tableMeta.map(_.owner).getOrElse("")}
+ |RootPaths: $truncatedPaths
+ |Partition Structure: ${partitionSchema.map(_.name).mkString(", ")}
+ |""".stripMargin)
+ }
+
+ private def partTableMaxFileExceedError(
+ scanFileSize: Number,
+ maxFileSize: Long,
+ tableMeta: Option[CatalogTable],
+ rootPaths: Seq[String],
+ partitions: Seq[String]): Throwable = {
+ val truncatedPaths =
+ if (rootPaths.length > 5) {
+ rootPaths.slice(0, 5).mkString(",") + """... """ + (rootPaths.length - 5) + " more paths"
+ } else {
+ rootPaths.mkString(",")
+ }
+
+ new MaxFileSizeExceedException(
+ s"""
+ |SQL job scan file size in bytes: $scanFileSize
+ |exceed restrict of table scan maxFileSize $maxFileSize
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${tableMeta.map(_.qualifiedName).getOrElse("")}
+ |Owner: ${tableMeta.map(_.owner).getOrElse("")}
+ |RootPaths: $truncatedPaths
+ |Partition Structure: ${partitions.mkString(", ")}
+ |""".stripMargin)
+ }
+
+ private def nonPartTableMaxFileExceedError(
+ scanFileSize: Number,
+ maxFileSize: Long,
+ tableMeta: Option[CatalogTable]): Throwable = {
+ new MaxFileSizeExceedException(
+ s"""
+ |SQL job scan file size in bytes: $scanFileSize
+ |exceed restrict of table scan maxFileSize $maxFileSize
+ |detail as below:
+ |Table: ${tableMeta.map(_.qualifiedName).getOrElse("")}
+ |Owner: ${tableMeta.map(_.owner).getOrElse("")}
+ |Location: ${tableMeta.map(_.location).getOrElse("")}
+ |""".stripMargin)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/InsertZorderBeforeWriting.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/InsertZorderBeforeWriting.scala
new file mode 100644
index 00000000000..b3f98ec6d7f
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/InsertZorderBeforeWriting.scala
@@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.zorder
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.expressions.{Ascending, Attribute, Expression, NullsLast, SortOrder}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
+import org.apache.spark.sql.hive.execution.InsertIntoHiveTable
+
+import org.apache.kyuubi.sql.{KyuubiSQLConf, KyuubiSQLExtensionException}
+
+trait InsertZorderHelper33 extends Rule[LogicalPlan] with ZorderBuilder {
+ private val KYUUBI_ZORDER_ENABLED = "kyuubi.zorder.enabled"
+ private val KYUUBI_ZORDER_COLS = "kyuubi.zorder.cols"
+
+ def isZorderEnabled(props: Map[String, String]): Boolean = {
+ props.contains(KYUUBI_ZORDER_ENABLED) &&
+ "true".equalsIgnoreCase(props(KYUUBI_ZORDER_ENABLED)) &&
+ props.contains(KYUUBI_ZORDER_COLS)
+ }
+
+ def getZorderColumns(props: Map[String, String]): Seq[String] = {
+ val cols = props.get(KYUUBI_ZORDER_COLS)
+ assert(cols.isDefined)
+ cols.get.split(",").map(_.trim)
+ }
+
+ def canInsertZorder(query: LogicalPlan): Boolean = query match {
+ case Project(_, child) => canInsertZorder(child)
+ // TODO: actually, we can force zorder even if existed some shuffle
+ case _: Sort => false
+ case _: RepartitionByExpression => false
+ case _: Repartition => false
+ case _ => true
+ }
+
+ def insertZorder(
+ catalogTable: CatalogTable,
+ plan: LogicalPlan,
+ dynamicPartitionColumns: Seq[Attribute]): LogicalPlan = {
+ if (!canInsertZorder(plan)) {
+ return plan
+ }
+ val cols = getZorderColumns(catalogTable.properties)
+ val resolver = session.sessionState.conf.resolver
+ val output = plan.output
+ val bound = cols.flatMap(col => output.find(attr => resolver(attr.name, col)))
+ if (bound.size < cols.size) {
+ logWarning(s"target table does not contain all zorder cols: ${cols.mkString(",")}, " +
+ s"please check your table properties ${KYUUBI_ZORDER_COLS}.")
+ plan
+ } else {
+ if (conf.getConf(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED) &&
+ conf.getConf(KyuubiSQLConf.REBALANCE_BEFORE_ZORDER)) {
+ throw new KyuubiSQLExtensionException(s"${KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED.key} " +
+ s"and ${KyuubiSQLConf.REBALANCE_BEFORE_ZORDER.key} can not be enabled together.")
+ }
+ if (conf.getConf(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED) &&
+ dynamicPartitionColumns.nonEmpty) {
+ logWarning(s"Dynamic partition insertion with global sort may produce small files.")
+ }
+
+ val zorderExpr =
+ if (bound.length == 1) {
+ bound
+ } else if (conf.getConf(KyuubiSQLConf.ZORDER_USING_ORIGINAL_ORDERING_ENABLED)) {
+ bound.asInstanceOf[Seq[Expression]]
+ } else {
+ buildZorder(bound) :: Nil
+ }
+ val (global, orderExprs, child) =
+ if (conf.getConf(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED)) {
+ (true, zorderExpr, plan)
+ } else if (conf.getConf(KyuubiSQLConf.REBALANCE_BEFORE_ZORDER)) {
+ val rebalanceExpr =
+ if (dynamicPartitionColumns.isEmpty) {
+ // static partition insert
+ bound
+ } else if (conf.getConf(KyuubiSQLConf.REBALANCE_ZORDER_COLUMNS_ENABLED)) {
+ // improve data compression ratio
+ dynamicPartitionColumns.asInstanceOf[Seq[Expression]] ++ bound
+ } else {
+ dynamicPartitionColumns.asInstanceOf[Seq[Expression]]
+ }
+ // for dynamic partition insert, Spark always sort the partition columns,
+ // so here we sort partition columns + zorder.
+ val rebalance =
+ if (dynamicPartitionColumns.nonEmpty &&
+ conf.getConf(KyuubiSQLConf.TWO_PHASE_REBALANCE_BEFORE_ZORDER)) {
+ // improve compression ratio
+ RebalancePartitions(
+ rebalanceExpr,
+ RebalancePartitions(dynamicPartitionColumns, plan))
+ } else {
+ RebalancePartitions(rebalanceExpr, plan)
+ }
+ (false, dynamicPartitionColumns.asInstanceOf[Seq[Expression]] ++ zorderExpr, rebalance)
+ } else {
+ (false, zorderExpr, plan)
+ }
+ val order = orderExprs.map { expr =>
+ SortOrder(expr, Ascending, NullsLast, Seq.empty)
+ }
+ Sort(order, global, child)
+ }
+ }
+
+ override def buildZorder(children: Seq[Expression]): ZorderBase = Zorder(children)
+
+ def session: SparkSession
+ def applyInternal(plan: LogicalPlan): LogicalPlan
+
+ final override def apply(plan: LogicalPlan): LogicalPlan = {
+ if (conf.getConf(KyuubiSQLConf.INSERT_ZORDER_BEFORE_WRITING)) {
+ applyInternal(plan)
+ } else {
+ plan
+ }
+ }
+}
+
+case class InsertZorderBeforeWritingDatasource33(session: SparkSession)
+ extends InsertZorderHelper33 {
+ override def applyInternal(plan: LogicalPlan): LogicalPlan = plan match {
+ case insert: InsertIntoHadoopFsRelationCommand
+ if insert.query.resolved &&
+ insert.bucketSpec.isEmpty && insert.catalogTable.isDefined &&
+ isZorderEnabled(insert.catalogTable.get.properties) =>
+ val dynamicPartition =
+ insert.partitionColumns.filterNot(attr => insert.staticPartitions.contains(attr.name))
+ val newQuery = insertZorder(insert.catalogTable.get, insert.query, dynamicPartition)
+ if (newQuery.eq(insert.query)) {
+ insert
+ } else {
+ insert.copy(query = newQuery)
+ }
+
+ case _ => plan
+ }
+}
+
+case class InsertZorderBeforeWritingHive33(session: SparkSession)
+ extends InsertZorderHelper33 {
+ override def applyInternal(plan: LogicalPlan): LogicalPlan = plan match {
+ case insert: InsertIntoHiveTable
+ if insert.query.resolved &&
+ insert.table.bucketSpec.isEmpty && isZorderEnabled(insert.table.properties) =>
+ val dynamicPartition = insert.partition.filter(_._2.isEmpty).keys
+ .flatMap(name => insert.query.output.find(_.name == name)).toSeq
+ val newQuery = insertZorder(insert.table, insert.query, dynamicPartition)
+ if (newQuery.eq(insert.query)) {
+ insert
+ } else {
+ insert.copy(query = newQuery)
+ }
+
+ case _ => plan
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/InsertZorderBeforeWritingBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/InsertZorderBeforeWritingBase.scala
new file mode 100644
index 00000000000..2c59d148e98
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/InsertZorderBeforeWritingBase.scala
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.zorder
+
+import java.util.Locale
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.expressions.{Ascending, Expression, NullsLast, SortOrder}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
+import org.apache.spark.sql.hive.execution.InsertIntoHiveTable
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+
+/**
+ * TODO: shall we forbid zorder if it's dynamic partition inserts ?
+ * Insert zorder before writing datasource if the target table properties has zorder properties
+ */
+abstract class InsertZorderBeforeWritingDatasourceBase
+ extends InsertZorderHelper {
+ override def applyInternal(plan: LogicalPlan): LogicalPlan = plan match {
+ case insert: InsertIntoHadoopFsRelationCommand
+ if insert.query.resolved && insert.bucketSpec.isEmpty && insert.catalogTable.isDefined &&
+ isZorderEnabled(insert.catalogTable.get.properties) =>
+ val newQuery = insertZorder(insert.catalogTable.get, insert.query)
+ if (newQuery.eq(insert.query)) {
+ insert
+ } else {
+ insert.copy(query = newQuery)
+ }
+ case _ => plan
+ }
+}
+
+/**
+ * TODO: shall we forbid zorder if it's dynamic partition inserts ?
+ * Insert zorder before writing hive if the target table properties has zorder properties
+ */
+abstract class InsertZorderBeforeWritingHiveBase
+ extends InsertZorderHelper {
+ override def applyInternal(plan: LogicalPlan): LogicalPlan = plan match {
+ case insert: InsertIntoHiveTable
+ if insert.query.resolved && insert.table.bucketSpec.isEmpty &&
+ isZorderEnabled(insert.table.properties) =>
+ val newQuery = insertZorder(insert.table, insert.query)
+ if (newQuery.eq(insert.query)) {
+ insert
+ } else {
+ insert.copy(query = newQuery)
+ }
+ case _ => plan
+ }
+}
+
+trait ZorderBuilder {
+ def buildZorder(children: Seq[Expression]): ZorderBase
+}
+
+trait InsertZorderHelper extends Rule[LogicalPlan] with ZorderBuilder {
+ private val KYUUBI_ZORDER_ENABLED = "kyuubi.zorder.enabled"
+ private val KYUUBI_ZORDER_COLS = "kyuubi.zorder.cols"
+
+ def isZorderEnabled(props: Map[String, String]): Boolean = {
+ props.contains(KYUUBI_ZORDER_ENABLED) &&
+ "true".equalsIgnoreCase(props(KYUUBI_ZORDER_ENABLED)) &&
+ props.contains(KYUUBI_ZORDER_COLS)
+ }
+
+ def getZorderColumns(props: Map[String, String]): Seq[String] = {
+ val cols = props.get(KYUUBI_ZORDER_COLS)
+ assert(cols.isDefined)
+ cols.get.split(",").map(_.trim.toLowerCase(Locale.ROOT))
+ }
+
+ def canInsertZorder(query: LogicalPlan): Boolean = query match {
+ case Project(_, child) => canInsertZorder(child)
+ // TODO: actually, we can force zorder even if existed some shuffle
+ case _: Sort => false
+ case _: RepartitionByExpression => false
+ case _: Repartition => false
+ case _ => true
+ }
+
+ def insertZorder(catalogTable: CatalogTable, plan: LogicalPlan): LogicalPlan = {
+ if (!canInsertZorder(plan)) {
+ return plan
+ }
+ val cols = getZorderColumns(catalogTable.properties)
+ val attrs = plan.output.map(attr => (attr.name, attr)).toMap
+ if (cols.exists(!attrs.contains(_))) {
+ logWarning(s"target table does not contain all zorder cols: ${cols.mkString(",")}, " +
+ s"please check your table properties ${KYUUBI_ZORDER_COLS}.")
+ plan
+ } else {
+ val bound = cols.map(attrs(_))
+ val orderExpr =
+ if (bound.length == 1) {
+ bound.head
+ } else {
+ buildZorder(bound)
+ }
+ // TODO: We can do rebalance partitions before local sort of zorder after SPARK 3.3
+ // see https://github.com/apache/spark/pull/34542
+ Sort(
+ SortOrder(orderExpr, Ascending, NullsLast, Seq.empty) :: Nil,
+ conf.getConf(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED),
+ plan)
+ }
+ }
+
+ def applyInternal(plan: LogicalPlan): LogicalPlan
+
+ final override def apply(plan: LogicalPlan): LogicalPlan = {
+ if (conf.getConf(KyuubiSQLConf.INSERT_ZORDER_BEFORE_WRITING)) {
+ applyInternal(plan)
+ } else {
+ plan
+ }
+ }
+}
+
+/**
+ * TODO: shall we forbid zorder if it's dynamic partition inserts ?
+ * Insert zorder before writing datasource if the target table properties has zorder properties
+ */
+case class InsertZorderBeforeWritingDatasource(session: SparkSession)
+ extends InsertZorderBeforeWritingDatasourceBase {
+ override def buildZorder(children: Seq[Expression]): ZorderBase = Zorder(children)
+}
+
+/**
+ * TODO: shall we forbid zorder if it's dynamic partition inserts ?
+ * Insert zorder before writing hive if the target table properties has zorder properties
+ */
+case class InsertZorderBeforeWritingHive(session: SparkSession)
+ extends InsertZorderBeforeWritingHiveBase {
+ override def buildZorder(children: Seq[Expression]): ZorderBase = Zorder(children)
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderCommandBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderCommandBase.scala
new file mode 100644
index 00000000000..21d1cf2a25b
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderCommandBase.scala
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.zorder
+
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.command.DataWritingCommand
+import org.apache.spark.sql.hive.execution.InsertIntoHiveTable
+
+import org.apache.kyuubi.sql.KyuubiSQLExtensionException
+
+/**
+ * A runnable command for zorder, we delegate to real command to execute
+ */
+abstract class OptimizeZorderCommandBase extends DataWritingCommand {
+ def catalogTable: CatalogTable
+
+ override def outputColumnNames: Seq[String] = query.output.map(_.name)
+
+ private def isHiveTable: Boolean = {
+ catalogTable.provider.isEmpty ||
+ (catalogTable.provider.isDefined && "hive".equalsIgnoreCase(catalogTable.provider.get))
+ }
+
+ private def getWritingCommand(session: SparkSession): DataWritingCommand = {
+ // TODO: Support convert hive relation to datasource relation, can see
+ // [[org.apache.spark.sql.hive.RelationConversions]]
+ InsertIntoHiveTable(
+ catalogTable,
+ catalogTable.partitionColumnNames.map(p => (p, None)).toMap,
+ query,
+ overwrite = true,
+ ifPartitionNotExists = false,
+ outputColumnNames)
+ }
+
+ override def run(session: SparkSession, child: SparkPlan): Seq[Row] = {
+ // TODO: Support datasource relation
+ // TODO: Support read and insert overwrite the same table for some table format
+ if (!isHiveTable) {
+ throw new KyuubiSQLExtensionException("only support hive table")
+ }
+
+ val command = getWritingCommand(session)
+ command.run(session, child)
+ DataWritingCommand.propogateMetrics(session.sparkContext, command, metrics)
+ Seq.empty
+ }
+}
+
+/**
+ * A runnable command for zorder, we delegate to real command to execute
+ */
+case class OptimizeZorderCommand(
+ catalogTable: CatalogTable,
+ query: LogicalPlan)
+ extends OptimizeZorderCommandBase {
+ protected def withNewChildInternal(newChild: LogicalPlan): LogicalPlan = {
+ copy(query = newChild)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderStatementBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderStatementBase.scala
new file mode 100644
index 00000000000..895f9e24be3
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderStatementBase.scala
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.zorder
+
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, UnaryNode}
+
+/**
+ * A zorder statement that contains we parsed from SQL.
+ * We should convert this plan to certain command at Analyzer.
+ */
+case class OptimizeZorderStatement(
+ tableIdentifier: Seq[String],
+ query: LogicalPlan) extends UnaryNode {
+ override def child: LogicalPlan = query
+ override def output: Seq[Attribute] = child.output
+ protected def withNewChildInternal(newChild: LogicalPlan): LogicalPlan =
+ copy(query = newChild)
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/ResolveZorderBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/ResolveZorderBase.scala
new file mode 100644
index 00000000000..9f735caa7a7
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/ResolveZorderBase.scala
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.zorder
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, HiveTableRelation}
+import org.apache.spark.sql.catalyst.expressions.AttributeSet
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.catalyst.rules.Rule
+
+import org.apache.kyuubi.sql.KyuubiSQLExtensionException
+
+/**
+ * Resolve `OptimizeZorderStatement` to `OptimizeZorderCommand`
+ */
+abstract class ResolveZorderBase extends Rule[LogicalPlan] {
+ def session: SparkSession
+ def buildOptimizeZorderCommand(
+ catalogTable: CatalogTable,
+ query: LogicalPlan): OptimizeZorderCommandBase
+
+ protected def checkQueryAllowed(query: LogicalPlan): Unit = query foreach {
+ case Filter(condition, SubqueryAlias(_, tableRelation: HiveTableRelation)) =>
+ if (tableRelation.partitionCols.isEmpty) {
+ throw new KyuubiSQLExtensionException("Filters are only supported for partitioned table")
+ }
+
+ val partitionKeyIds = AttributeSet(tableRelation.partitionCols)
+ if (condition.references.isEmpty || !condition.references.subsetOf(partitionKeyIds)) {
+ throw new KyuubiSQLExtensionException("Only partition column filters are allowed")
+ }
+
+ case _ =>
+ }
+
+ protected def getTableIdentifier(tableIdent: Seq[String]): TableIdentifier = tableIdent match {
+ case Seq(tbl) => TableIdentifier.apply(tbl)
+ case Seq(db, tbl) => TableIdentifier.apply(tbl, Some(db))
+ case _ => throw new KyuubiSQLExtensionException(
+ "only support session catalog table, please use db.table instead")
+ }
+
+ override def apply(plan: LogicalPlan): LogicalPlan = plan match {
+ case statement: OptimizeZorderStatement if statement.query.resolved =>
+ checkQueryAllowed(statement.query)
+ val tableIdentifier = getTableIdentifier(statement.tableIdentifier)
+ val catalogTable = session.sessionState.catalog.getTableMetadata(tableIdentifier)
+ buildOptimizeZorderCommand(catalogTable, statement.query)
+
+ case _ => plan
+ }
+}
+
+/**
+ * Resolve `OptimizeZorderStatement` to `OptimizeZorderCommand`
+ */
+case class ResolveZorder(session: SparkSession) extends ResolveZorderBase {
+ override def buildOptimizeZorderCommand(
+ catalogTable: CatalogTable,
+ query: LogicalPlan): OptimizeZorderCommandBase = {
+ OptimizeZorderCommand(catalogTable, query)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderBase.scala
new file mode 100644
index 00000000000..e4d98ccbe84
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderBase.scala
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.zorder
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode, FalseLiteral}
+import org.apache.spark.sql.catalyst.expressions.codegen.Block._
+import org.apache.spark.sql.types.{BinaryType, DataType}
+
+import org.apache.kyuubi.sql.KyuubiSQLExtensionException
+
+abstract class ZorderBase extends Expression {
+ override def foldable: Boolean = children.forall(_.foldable)
+ override def nullable: Boolean = false
+ override def dataType: DataType = BinaryType
+ override def prettyName: String = "zorder"
+
+ override def checkInputDataTypes(): TypeCheckResult = {
+ try {
+ defaultNullValues
+ TypeCheckResult.TypeCheckSuccess
+ } catch {
+ case e: KyuubiSQLExtensionException =>
+ TypeCheckResult.TypeCheckFailure(e.getMessage)
+ }
+ }
+
+ @transient
+ private[this] lazy val defaultNullValues: Array[Any] =
+ children.map(_.dataType)
+ .map(ZorderBytesUtils.defaultValue)
+ .toArray
+
+ override def eval(input: InternalRow): Any = {
+ val childrenValues = children.zipWithIndex.map {
+ case (child: Expression, index) =>
+ val v = child.eval(input)
+ if (v == null) {
+ defaultNullValues(index)
+ } else {
+ v
+ }
+ }
+ ZorderBytesUtils.interleaveBits(childrenValues.toArray)
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val evals = children.map(_.genCode(ctx))
+ val defaultValues = ctx.addReferenceObj("defaultValues", defaultNullValues)
+ val values = ctx.freshName("values")
+ val util = ZorderBytesUtils.getClass.getName.stripSuffix("$")
+ val inputs = evals.zipWithIndex.map {
+ case (eval, index) =>
+ s"""
+ |${eval.code}
+ |if (${eval.isNull}) {
+ | $values[$index] = $defaultValues[$index];
+ |} else {
+ | $values[$index] = ${eval.value};
+ |}
+ |""".stripMargin
+ }
+ ev.copy(
+ code =
+ code"""
+ |byte[] ${ev.value} = null;
+ |Object[] $values = new Object[${evals.length}];
+ |${inputs.mkString("\n")}
+ |${ev.value} = $util.interleaveBits($values);
+ |""".stripMargin,
+ isNull = FalseLiteral)
+ }
+}
+
+case class Zorder(children: Seq[Expression]) extends ZorderBase {
+ protected def withNewChildrenInternal(newChildren: IndexedSeq[Expression]): Expression =
+ copy(children = newChildren)
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderBytesUtils.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderBytesUtils.scala
new file mode 100644
index 00000000000..d249f1dc32f
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/zorder/ZorderBytesUtils.scala
@@ -0,0 +1,517 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.zorder
+
+import java.lang.{Double => jDouble, Float => jFloat}
+
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+import org.apache.kyuubi.sql.KyuubiSQLExtensionException
+
+object ZorderBytesUtils {
+ final private val BIT_8_MASK = 1 << 7
+ final private val BIT_16_MASK = 1 << 15
+ final private val BIT_32_MASK = 1 << 31
+ final private val BIT_64_MASK = 1L << 63
+
+ def interleaveBits(inputs: Array[Any]): Array[Byte] = {
+ inputs.length match {
+ // it's a more fast approach, use O(8 * 8)
+ // can see http://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableObvious
+ case 1 => longToByte(toLong(inputs(0)))
+ case 2 => interleave2Longs(toLong(inputs(0)), toLong(inputs(1)))
+ case 3 => interleave3Longs(toLong(inputs(0)), toLong(inputs(1)), toLong(inputs(2)))
+ case 4 =>
+ interleave4Longs(toLong(inputs(0)), toLong(inputs(1)), toLong(inputs(2)), toLong(inputs(3)))
+ case 5 => interleave5Longs(
+ toLong(inputs(0)),
+ toLong(inputs(1)),
+ toLong(inputs(2)),
+ toLong(inputs(3)),
+ toLong(inputs(4)))
+ case 6 => interleave6Longs(
+ toLong(inputs(0)),
+ toLong(inputs(1)),
+ toLong(inputs(2)),
+ toLong(inputs(3)),
+ toLong(inputs(4)),
+ toLong(inputs(5)))
+ case 7 => interleave7Longs(
+ toLong(inputs(0)),
+ toLong(inputs(1)),
+ toLong(inputs(2)),
+ toLong(inputs(3)),
+ toLong(inputs(4)),
+ toLong(inputs(5)),
+ toLong(inputs(6)))
+ case 8 => interleave8Longs(
+ toLong(inputs(0)),
+ toLong(inputs(1)),
+ toLong(inputs(2)),
+ toLong(inputs(3)),
+ toLong(inputs(4)),
+ toLong(inputs(5)),
+ toLong(inputs(6)),
+ toLong(inputs(7)))
+
+ case _ =>
+ // it's the default approach, use O(64 * n), n is the length of inputs
+ interleaveBitsDefault(inputs.map(toByteArray))
+ }
+ }
+
+ private def interleave2Longs(l1: Long, l2: Long): Array[Byte] = {
+ // output 8 * 16 bits
+ val result = new Array[Byte](16)
+ var i = 0
+ while (i < 8) {
+ val tmp1 = ((l1 >> (i * 8)) & 0xFF).toShort
+ val tmp2 = ((l2 >> (i * 8)) & 0xFF).toShort
+
+ var z = 0
+ var j = 0
+ while (j < 8) {
+ val x_masked = tmp1 & (1 << j)
+ val y_masked = tmp2 & (1 << j)
+ z |= (x_masked << j)
+ z |= (y_masked << (j + 1))
+ j = j + 1
+ }
+ result((7 - i) * 2 + 1) = (z & 0xFF).toByte
+ result((7 - i) * 2) = ((z >> 8) & 0xFF).toByte
+ i = i + 1
+ }
+ result
+ }
+
+ private def interleave3Longs(l1: Long, l2: Long, l3: Long): Array[Byte] = {
+ // output 8 * 24 bits
+ val result = new Array[Byte](24)
+ var i = 0
+ while (i < 8) {
+ val tmp1 = ((l1 >> (i * 8)) & 0xFF).toInt
+ val tmp2 = ((l2 >> (i * 8)) & 0xFF).toInt
+ val tmp3 = ((l3 >> (i * 8)) & 0xFF).toInt
+
+ var z = 0
+ var j = 0
+ while (j < 8) {
+ val r1_mask = tmp1 & (1 << j)
+ val r2_mask = tmp2 & (1 << j)
+ val r3_mask = tmp3 & (1 << j)
+ z |= (r1_mask << (2 * j)) | (r2_mask << (2 * j + 1)) | (r3_mask << (2 * j + 2))
+ j = j + 1
+ }
+ result((7 - i) * 3 + 2) = (z & 0xFF).toByte
+ result((7 - i) * 3 + 1) = ((z >> 8) & 0xFF).toByte
+ result((7 - i) * 3) = ((z >> 16) & 0xFF).toByte
+ i = i + 1
+ }
+ result
+ }
+
+ private def interleave4Longs(l1: Long, l2: Long, l3: Long, l4: Long): Array[Byte] = {
+ // output 8 * 32 bits
+ val result = new Array[Byte](32)
+ var i = 0
+ while (i < 8) {
+ val tmp1 = ((l1 >> (i * 8)) & 0xFF).toInt
+ val tmp2 = ((l2 >> (i * 8)) & 0xFF).toInt
+ val tmp3 = ((l3 >> (i * 8)) & 0xFF).toInt
+ val tmp4 = ((l4 >> (i * 8)) & 0xFF).toInt
+
+ var z = 0
+ var j = 0
+ while (j < 8) {
+ val r1_mask = tmp1 & (1 << j)
+ val r2_mask = tmp2 & (1 << j)
+ val r3_mask = tmp3 & (1 << j)
+ val r4_mask = tmp4 & (1 << j)
+ z |= (r1_mask << (3 * j)) | (r2_mask << (3 * j + 1)) | (r3_mask << (3 * j + 2)) |
+ (r4_mask << (3 * j + 3))
+ j = j + 1
+ }
+ result((7 - i) * 4 + 3) = (z & 0xFF).toByte
+ result((7 - i) * 4 + 2) = ((z >> 8) & 0xFF).toByte
+ result((7 - i) * 4 + 1) = ((z >> 16) & 0xFF).toByte
+ result((7 - i) * 4) = ((z >> 24) & 0xFF).toByte
+ i = i + 1
+ }
+ result
+ }
+
+ private def interleave5Longs(
+ l1: Long,
+ l2: Long,
+ l3: Long,
+ l4: Long,
+ l5: Long): Array[Byte] = {
+ // output 8 * 40 bits
+ val result = new Array[Byte](40)
+ var i = 0
+ while (i < 8) {
+ val tmp1 = ((l1 >> (i * 8)) & 0xFF).toLong
+ val tmp2 = ((l2 >> (i * 8)) & 0xFF).toLong
+ val tmp3 = ((l3 >> (i * 8)) & 0xFF).toLong
+ val tmp4 = ((l4 >> (i * 8)) & 0xFF).toLong
+ val tmp5 = ((l5 >> (i * 8)) & 0xFF).toLong
+
+ var z = 0L
+ var j = 0
+ while (j < 8) {
+ val r1_mask = tmp1 & (1 << j)
+ val r2_mask = tmp2 & (1 << j)
+ val r3_mask = tmp3 & (1 << j)
+ val r4_mask = tmp4 & (1 << j)
+ val r5_mask = tmp5 & (1 << j)
+ z |= (r1_mask << (4 * j)) | (r2_mask << (4 * j + 1)) | (r3_mask << (4 * j + 2)) |
+ (r4_mask << (4 * j + 3)) | (r5_mask << (4 * j + 4))
+ j = j + 1
+ }
+ result((7 - i) * 5 + 4) = (z & 0xFF).toByte
+ result((7 - i) * 5 + 3) = ((z >> 8) & 0xFF).toByte
+ result((7 - i) * 5 + 2) = ((z >> 16) & 0xFF).toByte
+ result((7 - i) * 5 + 1) = ((z >> 24) & 0xFF).toByte
+ result((7 - i) * 5) = ((z >> 32) & 0xFF).toByte
+ i = i + 1
+ }
+ result
+ }
+
+ private def interleave6Longs(
+ l1: Long,
+ l2: Long,
+ l3: Long,
+ l4: Long,
+ l5: Long,
+ l6: Long): Array[Byte] = {
+ // output 8 * 48 bits
+ val result = new Array[Byte](48)
+ var i = 0
+ while (i < 8) {
+ val tmp1 = ((l1 >> (i * 8)) & 0xFF).toLong
+ val tmp2 = ((l2 >> (i * 8)) & 0xFF).toLong
+ val tmp3 = ((l3 >> (i * 8)) & 0xFF).toLong
+ val tmp4 = ((l4 >> (i * 8)) & 0xFF).toLong
+ val tmp5 = ((l5 >> (i * 8)) & 0xFF).toLong
+ val tmp6 = ((l6 >> (i * 8)) & 0xFF).toLong
+
+ var z = 0L
+ var j = 0
+ while (j < 8) {
+ val r1_mask = tmp1 & (1 << j)
+ val r2_mask = tmp2 & (1 << j)
+ val r3_mask = tmp3 & (1 << j)
+ val r4_mask = tmp4 & (1 << j)
+ val r5_mask = tmp5 & (1 << j)
+ val r6_mask = tmp6 & (1 << j)
+ z |= (r1_mask << (5 * j)) | (r2_mask << (5 * j + 1)) | (r3_mask << (5 * j + 2)) |
+ (r4_mask << (5 * j + 3)) | (r5_mask << (5 * j + 4)) | (r6_mask << (5 * j + 5))
+ j = j + 1
+ }
+ result((7 - i) * 6 + 5) = (z & 0xFF).toByte
+ result((7 - i) * 6 + 4) = ((z >> 8) & 0xFF).toByte
+ result((7 - i) * 6 + 3) = ((z >> 16) & 0xFF).toByte
+ result((7 - i) * 6 + 2) = ((z >> 24) & 0xFF).toByte
+ result((7 - i) * 6 + 1) = ((z >> 32) & 0xFF).toByte
+ result((7 - i) * 6) = ((z >> 40) & 0xFF).toByte
+ i = i + 1
+ }
+ result
+ }
+
+ private def interleave7Longs(
+ l1: Long,
+ l2: Long,
+ l3: Long,
+ l4: Long,
+ l5: Long,
+ l6: Long,
+ l7: Long): Array[Byte] = {
+ // output 8 * 56 bits
+ val result = new Array[Byte](56)
+ var i = 0
+ while (i < 8) {
+ val tmp1 = ((l1 >> (i * 8)) & 0xFF).toLong
+ val tmp2 = ((l2 >> (i * 8)) & 0xFF).toLong
+ val tmp3 = ((l3 >> (i * 8)) & 0xFF).toLong
+ val tmp4 = ((l4 >> (i * 8)) & 0xFF).toLong
+ val tmp5 = ((l5 >> (i * 8)) & 0xFF).toLong
+ val tmp6 = ((l6 >> (i * 8)) & 0xFF).toLong
+ val tmp7 = ((l7 >> (i * 8)) & 0xFF).toLong
+
+ var z = 0L
+ var j = 0
+ while (j < 8) {
+ val r1_mask = tmp1 & (1 << j)
+ val r2_mask = tmp2 & (1 << j)
+ val r3_mask = tmp3 & (1 << j)
+ val r4_mask = tmp4 & (1 << j)
+ val r5_mask = tmp5 & (1 << j)
+ val r6_mask = tmp6 & (1 << j)
+ val r7_mask = tmp7 & (1 << j)
+ z |= (r1_mask << (6 * j)) | (r2_mask << (6 * j + 1)) | (r3_mask << (6 * j + 2)) |
+ (r4_mask << (6 * j + 3)) | (r5_mask << (6 * j + 4)) | (r6_mask << (6 * j + 5)) |
+ (r7_mask << (6 * j + 6))
+ j = j + 1
+ }
+ result((7 - i) * 7 + 6) = (z & 0xFF).toByte
+ result((7 - i) * 7 + 5) = ((z >> 8) & 0xFF).toByte
+ result((7 - i) * 7 + 4) = ((z >> 16) & 0xFF).toByte
+ result((7 - i) * 7 + 3) = ((z >> 24) & 0xFF).toByte
+ result((7 - i) * 7 + 2) = ((z >> 32) & 0xFF).toByte
+ result((7 - i) * 7 + 1) = ((z >> 40) & 0xFF).toByte
+ result((7 - i) * 7) = ((z >> 48) & 0xFF).toByte
+ i = i + 1
+ }
+ result
+ }
+
+ private def interleave8Longs(
+ l1: Long,
+ l2: Long,
+ l3: Long,
+ l4: Long,
+ l5: Long,
+ l6: Long,
+ l7: Long,
+ l8: Long): Array[Byte] = {
+ // output 8 * 64 bits
+ val result = new Array[Byte](64)
+ var i = 0
+ while (i < 8) {
+ val tmp1 = ((l1 >> (i * 8)) & 0xFF).toLong
+ val tmp2 = ((l2 >> (i * 8)) & 0xFF).toLong
+ val tmp3 = ((l3 >> (i * 8)) & 0xFF).toLong
+ val tmp4 = ((l4 >> (i * 8)) & 0xFF).toLong
+ val tmp5 = ((l5 >> (i * 8)) & 0xFF).toLong
+ val tmp6 = ((l6 >> (i * 8)) & 0xFF).toLong
+ val tmp7 = ((l7 >> (i * 8)) & 0xFF).toLong
+ val tmp8 = ((l8 >> (i * 8)) & 0xFF).toLong
+
+ var z = 0L
+ var j = 0
+ while (j < 8) {
+ val r1_mask = tmp1 & (1 << j)
+ val r2_mask = tmp2 & (1 << j)
+ val r3_mask = tmp3 & (1 << j)
+ val r4_mask = tmp4 & (1 << j)
+ val r5_mask = tmp5 & (1 << j)
+ val r6_mask = tmp6 & (1 << j)
+ val r7_mask = tmp7 & (1 << j)
+ val r8_mask = tmp8 & (1 << j)
+ z |= (r1_mask << (7 * j)) | (r2_mask << (7 * j + 1)) | (r3_mask << (7 * j + 2)) |
+ (r4_mask << (7 * j + 3)) | (r5_mask << (7 * j + 4)) | (r6_mask << (7 * j + 5)) |
+ (r7_mask << (7 * j + 6)) | (r8_mask << (7 * j + 7))
+ j = j + 1
+ }
+ result((7 - i) * 8 + 7) = (z & 0xFF).toByte
+ result((7 - i) * 8 + 6) = ((z >> 8) & 0xFF).toByte
+ result((7 - i) * 8 + 5) = ((z >> 16) & 0xFF).toByte
+ result((7 - i) * 8 + 4) = ((z >> 24) & 0xFF).toByte
+ result((7 - i) * 8 + 3) = ((z >> 32) & 0xFF).toByte
+ result((7 - i) * 8 + 2) = ((z >> 40) & 0xFF).toByte
+ result((7 - i) * 8 + 1) = ((z >> 48) & 0xFF).toByte
+ result((7 - i) * 8) = ((z >> 56) & 0xFF).toByte
+ i = i + 1
+ }
+ result
+ }
+
+ def interleaveBitsDefault(arrays: Array[Array[Byte]]): Array[Byte] = {
+ var totalLength = 0
+ var maxLength = 0
+ arrays.foreach { array =>
+ totalLength += array.length
+ maxLength = maxLength.max(array.length * 8)
+ }
+ val result = new Array[Byte](totalLength)
+ var resultBit = 0
+
+ var bit = 0
+ while (bit < maxLength) {
+ val bytePos = bit / 8
+ val bitPos = bit % 8
+
+ for (arr <- arrays) {
+ val len = arr.length
+ if (bytePos < len) {
+ val resultBytePos = totalLength - 1 - resultBit / 8
+ val resultBitPos = resultBit % 8
+ result(resultBytePos) =
+ updatePos(result(resultBytePos), resultBitPos, arr(len - 1 - bytePos), bitPos)
+ resultBit += 1
+ }
+ }
+ bit += 1
+ }
+ result
+ }
+
+ def updatePos(a: Byte, apos: Int, b: Byte, bpos: Int): Byte = {
+ var temp = (b & (1 << bpos)).toByte
+ if (apos > bpos) {
+ temp = (temp << (apos - bpos)).toByte
+ } else if (apos < bpos) {
+ temp = (temp >> (bpos - apos)).toByte
+ }
+ val atemp = (a & (1 << apos)).toByte
+ if (atemp == temp) {
+ return a
+ }
+ (a ^ (1 << apos)).toByte
+ }
+
+ def toLong(a: Any): Long = {
+ a match {
+ case b: Boolean => (if (b) 1 else 0).toLong ^ BIT_64_MASK
+ case b: Byte => b.toLong ^ BIT_64_MASK
+ case s: Short => s.toLong ^ BIT_64_MASK
+ case i: Int => i.toLong ^ BIT_64_MASK
+ case l: Long => l ^ BIT_64_MASK
+ case f: Float => java.lang.Float.floatToRawIntBits(f).toLong ^ BIT_64_MASK
+ case d: Double => java.lang.Double.doubleToRawLongBits(d) ^ BIT_64_MASK
+ case str: UTF8String => str.getPrefix
+ case dec: Decimal => dec.toLong ^ BIT_64_MASK
+ case other: Any =>
+ throw new KyuubiSQLExtensionException("Unsupported z-order type: " + other.getClass)
+ }
+ }
+
+ def toByteArray(a: Any): Array[Byte] = {
+ a match {
+ case bo: Boolean =>
+ booleanToByte(bo)
+ case b: Byte =>
+ byteToByte(b)
+ case s: Short =>
+ shortToByte(s)
+ case i: Int =>
+ intToByte(i)
+ case l: Long =>
+ longToByte(l)
+ case f: Float =>
+ floatToByte(f)
+ case d: Double =>
+ doubleToByte(d)
+ case str: UTF8String =>
+ // truncate or padding str to 8 byte
+ paddingTo8Byte(str.getBytes)
+ case dec: Decimal =>
+ longToByte(dec.toLong)
+ case other: Any =>
+ throw new KyuubiSQLExtensionException("Unsupported z-order type: " + other.getClass)
+ }
+ }
+
+ def booleanToByte(a: Boolean): Array[Byte] = {
+ if (a) {
+ byteToByte(1.toByte)
+ } else {
+ byteToByte(0.toByte)
+ }
+ }
+
+ def byteToByte(a: Byte): Array[Byte] = {
+ val tmp = (a ^ BIT_8_MASK).toByte
+ Array(tmp)
+ }
+
+ def shortToByte(a: Short): Array[Byte] = {
+ val tmp = a ^ BIT_16_MASK
+ Array(((tmp >> 8) & 0xFF).toByte, (tmp & 0xFF).toByte)
+ }
+
+ def intToByte(a: Int): Array[Byte] = {
+ val result = new Array[Byte](4)
+ var i = 0
+ val tmp = a ^ BIT_32_MASK
+ while (i <= 3) {
+ val offset = i * 8
+ result(3 - i) = ((tmp >> offset) & 0xFF).toByte
+ i += 1
+ }
+ result
+ }
+
+ def longToByte(a: Long): Array[Byte] = {
+ val result = new Array[Byte](8)
+ var i = 0
+ val tmp = a ^ BIT_64_MASK
+ while (i <= 7) {
+ val offset = i * 8
+ result(7 - i) = ((tmp >> offset) & 0xFF).toByte
+ i += 1
+ }
+ result
+ }
+
+ def floatToByte(a: Float): Array[Byte] = {
+ val fi = jFloat.floatToRawIntBits(a)
+ intToByte(fi)
+ }
+
+ def doubleToByte(a: Double): Array[Byte] = {
+ val dl = jDouble.doubleToRawLongBits(a)
+ longToByte(dl)
+ }
+
+ def paddingTo8Byte(a: Array[Byte]): Array[Byte] = {
+ val len = a.length
+ if (len == 8) {
+ a
+ } else if (len > 8) {
+ val result = new Array[Byte](8)
+ System.arraycopy(a, 0, result, 0, 8)
+ result
+ } else {
+ val result = new Array[Byte](8)
+ System.arraycopy(a, 0, result, 8 - len, len)
+ result
+ }
+ }
+
+ def defaultByteArrayValue(dataType: DataType): Array[Byte] = toByteArray {
+ defaultValue(dataType)
+ }
+
+ def defaultValue(dataType: DataType): Any = {
+ dataType match {
+ case BooleanType =>
+ true
+ case ByteType =>
+ Byte.MaxValue
+ case ShortType =>
+ Short.MaxValue
+ case IntegerType | DateType =>
+ Int.MaxValue
+ case LongType | TimestampType | _: DecimalType =>
+ Long.MaxValue
+ case FloatType =>
+ Float.MaxValue
+ case DoubleType =>
+ Double.MaxValue
+ case StringType =>
+ // we pad string to 8 bytes so it's equal to long
+ UTF8String.fromBytes(longToByte(Long.MaxValue))
+ case other: Any =>
+ throw new KyuubiSQLExtensionException(s"Unsupported z-order type: ${other.catalogString}")
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/FinalStageResourceManager.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/FinalStageResourceManager.scala
new file mode 100644
index 00000000000..81873476cc4
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/FinalStageResourceManager.scala
@@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.annotation.tailrec
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{ExecutorAllocationClient, MapOutputTrackerMaster, SparkContext, SparkEnv}
+import org.apache.spark.internal.Logging
+import org.apache.spark.resource.ResourceProfile
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SortExec, SparkPlan}
+import org.apache.spark.sql.execution.adaptive._
+import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
+import org.apache.spark.sql.execution.command.DataWritingCommandExec
+import org.apache.spark.sql.execution.datasources.WriteFilesExec
+import org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec
+import org.apache.spark.sql.execution.exchange.{ENSURE_REQUIREMENTS, ShuffleExchangeExec}
+
+import org.apache.kyuubi.sql.{KyuubiSQLConf, WriteUtils}
+
+/**
+ * This rule assumes the final write stage has less cores requirement than previous, otherwise
+ * this rule would take no effect.
+ *
+ * It provide a feature:
+ * 1. Kill redundant executors before running final write stage
+ */
+case class FinalStageResourceManager(session: SparkSession)
+ extends Rule[SparkPlan] with FinalRebalanceStageHelper {
+ override def apply(plan: SparkPlan): SparkPlan = {
+ if (!conf.getConf(KyuubiSQLConf.FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_ENABLED)) {
+ return plan
+ }
+
+ if (!WriteUtils.isWrite(session, plan)) {
+ return plan
+ }
+
+ val sc = session.sparkContext
+ val dra = sc.getConf.getBoolean("spark.dynamicAllocation.enabled", false)
+ val coresPerExecutor = sc.getConf.getInt("spark.executor.cores", 1)
+ val minExecutors = sc.getConf.getInt("spark.dynamicAllocation.minExecutors", 0)
+ val maxExecutors = sc.getConf.getInt("spark.dynamicAllocation.maxExecutors", Int.MaxValue)
+ val factor = conf.getConf(KyuubiSQLConf.FINAL_WRITE_STAGE_PARTITION_FACTOR)
+ val hasImprovementRoom = maxExecutors - 1 > minExecutors * factor
+ // Fast fail if:
+ // 1. DRA off
+ // 2. only work with yarn and k8s
+ // 3. maxExecutors is not bigger than minExecutors * factor
+ if (!dra || !sc.schedulerBackend.isInstanceOf[CoarseGrainedSchedulerBackend] ||
+ !hasImprovementRoom) {
+ return plan
+ }
+
+ val stageOpt = findFinalRebalanceStage(plan)
+ if (stageOpt.isEmpty) {
+ return plan
+ }
+
+ // It's not safe to kill executors if this plan contains table cache.
+ // If the executor loses then the rdd would re-compute those partition.
+ if (hasTableCache(plan) &&
+ conf.getConf(KyuubiSQLConf.FINAL_WRITE_STAGE_SKIP_KILLING_EXECUTORS_FOR_TABLE_CACHE)) {
+ return plan
+ }
+
+ // TODO: move this to query stage optimizer when updating Spark to 3.5.x
+ // Since we are in `prepareQueryStage`, the AQE shuffle read has not been applied.
+ // So we need to apply it by self.
+ val shuffleRead = queryStageOptimizerRules.foldLeft(stageOpt.get.asInstanceOf[SparkPlan]) {
+ case (latest, rule) => rule.apply(latest)
+ }
+ val (targetCores, stage) = shuffleRead match {
+ case AQEShuffleReadExec(stage: ShuffleQueryStageExec, partitionSpecs) =>
+ (partitionSpecs.length, stage)
+ case stage: ShuffleQueryStageExec =>
+ // we can still kill executors if no AQE shuffle read, e.g., `.repartition(2)`
+ (stage.shuffle.numPartitions, stage)
+ case _ =>
+ // it should never happen in current Spark, but to be safe do nothing if happens
+ logWarning("BUG, Please report to Apache Kyuubi community")
+ return plan
+ }
+ // The condition whether inject custom resource profile:
+ // - target executors < active executors
+ // - active executors - target executors > min executors
+ val numActiveExecutors = sc.getExecutorIds().length
+ val targetExecutors = (math.ceil(targetCores.toFloat / coresPerExecutor) * factor).toInt
+ .max(1)
+ val hasBenefits = targetExecutors < numActiveExecutors &&
+ (numActiveExecutors - targetExecutors) > minExecutors
+ logInfo(s"The snapshot of current executors view, " +
+ s"active executors: $numActiveExecutors, min executor: $minExecutors, " +
+ s"target executors: $targetExecutors, has benefits: $hasBenefits")
+ if (hasBenefits) {
+ val shuffleId = stage.plan.asInstanceOf[ShuffleExchangeExec].shuffleDependency.shuffleId
+ val numReduce = stage.plan.asInstanceOf[ShuffleExchangeExec].numPartitions
+ // Now, there is only a final rebalance stage waiting to execute and all tasks of previous
+ // stage are finished. Kill redundant existed executors eagerly so the tasks of final
+ // stage can be centralized scheduled.
+ killExecutors(sc, targetExecutors, shuffleId, numReduce)
+ }
+
+ plan
+ }
+
+ /**
+ * The priority of kill executors follow:
+ * 1. kill executor who is younger than other (The older the JIT works better)
+ * 2. kill executor who produces less shuffle data first
+ */
+ private def findExecutorToKill(
+ sc: SparkContext,
+ targetExecutors: Int,
+ shuffleId: Int,
+ numReduce: Int): Seq[String] = {
+ val tracker = SparkEnv.get.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster]
+ val shuffleStatusOpt = tracker.shuffleStatuses.get(shuffleId)
+ if (shuffleStatusOpt.isEmpty) {
+ return Seq.empty
+ }
+ val shuffleStatus = shuffleStatusOpt.get
+ val executorToBlockSize = new mutable.HashMap[String, Long]
+ shuffleStatus.withMapStatuses { mapStatus =>
+ mapStatus.foreach { status =>
+ var i = 0
+ var sum = 0L
+ while (i < numReduce) {
+ sum += status.getSizeForBlock(i)
+ i += 1
+ }
+ executorToBlockSize.getOrElseUpdate(status.location.executorId, sum)
+ }
+ }
+
+ val backend = sc.schedulerBackend.asInstanceOf[CoarseGrainedSchedulerBackend]
+ val executorsWithRegistrationTs = backend.getExecutorsWithRegistrationTs()
+ val existedExecutors = executorsWithRegistrationTs.keys.toSet
+ val expectedNumExecutorToKill = existedExecutors.size - targetExecutors
+ if (expectedNumExecutorToKill < 1) {
+ return Seq.empty
+ }
+
+ val executorIdsToKill = new ArrayBuffer[String]()
+ // We first kill executor who does not hold shuffle block. It would happen because
+ // the last stage is running fast and finished in a short time. The existed executors are
+ // from previous stages that have not been killed by DRA, so we can not find it by tracking
+ // shuffle status.
+ // We should evict executors by their alive time first and retain all of executors which
+ // have better locality for shuffle block.
+ executorsWithRegistrationTs.toSeq.sortBy(_._2).foreach { case (id, _) =>
+ if (executorIdsToKill.length < expectedNumExecutorToKill &&
+ !executorToBlockSize.contains(id)) {
+ executorIdsToKill.append(id)
+ }
+ }
+
+ // Evict the rest executors according to the shuffle block size
+ executorToBlockSize.toSeq.sortBy(_._2).foreach { case (id, _) =>
+ if (executorIdsToKill.length < expectedNumExecutorToKill && existedExecutors.contains(id)) {
+ executorIdsToKill.append(id)
+ }
+ }
+
+ executorIdsToKill.toSeq
+ }
+
+ private def killExecutors(
+ sc: SparkContext,
+ targetExecutors: Int,
+ shuffleId: Int,
+ numReduce: Int): Unit = {
+ val executorAllocationClient = sc.schedulerBackend.asInstanceOf[ExecutorAllocationClient]
+
+ val executorsToKill =
+ if (conf.getConf(KyuubiSQLConf.FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_KILL_ALL)) {
+ executorAllocationClient.getExecutorIds()
+ } else {
+ findExecutorToKill(sc, targetExecutors, shuffleId, numReduce)
+ }
+ logInfo(s"Request to kill executors, total count ${executorsToKill.size}, " +
+ s"[${executorsToKill.mkString(", ")}].")
+ if (executorsToKill.isEmpty) {
+ return
+ }
+
+ // Note, `SparkContext#killExecutors` does not allow with DRA enabled,
+ // see `https://github.com/apache/spark/pull/20604`.
+ // It may cause the status in `ExecutorAllocationManager` inconsistent with
+ // `CoarseGrainedSchedulerBackend` for a while. But it should be synchronous finally.
+ //
+ // We should adjust target num executors, otherwise `YarnAllocator` might re-request original
+ // target executors if DRA has not updated target executors yet.
+ // Note, DRA would re-adjust executors if there are more tasks to be executed, so we are safe.
+ //
+ // * We kill executor
+ // * YarnAllocator re-request target executors
+ // * DRA can not release executors since they are new added
+ // ----------------------------------------------------------------> timeline
+ executorAllocationClient.killExecutors(
+ executorIds = executorsToKill,
+ adjustTargetNumExecutors = true,
+ countFailures = false,
+ force = false)
+
+ FinalStageResourceManager.getAdjustedTargetExecutors(sc)
+ .filter(_ < targetExecutors).foreach { adjustedExecutors =>
+ val delta = targetExecutors - adjustedExecutors
+ logInfo(s"Target executors after kill ($adjustedExecutors) is lower than required " +
+ s"($targetExecutors). Requesting $delta additional executor(s).")
+ executorAllocationClient.requestExecutors(delta)
+ }
+ }
+
+ @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
+ OptimizeSkewInRebalancePartitions,
+ CoalesceShufflePartitions(session),
+ OptimizeShuffleWithLocalRead)
+}
+
+object FinalStageResourceManager extends Logging {
+
+ private[sql] def getAdjustedTargetExecutors(sc: SparkContext): Option[Int] = {
+ sc.schedulerBackend match {
+ case schedulerBackend: CoarseGrainedSchedulerBackend =>
+ try {
+ val field = classOf[CoarseGrainedSchedulerBackend]
+ .getDeclaredField("requestedTotalExecutorsPerResourceProfile")
+ field.setAccessible(true)
+ schedulerBackend.synchronized {
+ val requestedTotalExecutorsPerResourceProfile =
+ field.get(schedulerBackend).asInstanceOf[mutable.HashMap[ResourceProfile, Int]]
+ val defaultRp = sc.resourceProfileManager.defaultResourceProfile
+ requestedTotalExecutorsPerResourceProfile.get(defaultRp)
+ }
+ } catch {
+ case e: Exception =>
+ logWarning("Failed to get requestedTotalExecutors of Default ResourceProfile", e)
+ None
+ }
+ case _ => None
+ }
+ }
+}
+
+trait FinalRebalanceStageHelper extends AdaptiveSparkPlanHelper {
+ @tailrec
+ final protected def findFinalRebalanceStage(plan: SparkPlan): Option[ShuffleQueryStageExec] = {
+ plan match {
+ case write: DataWritingCommandExec => findFinalRebalanceStage(write.child)
+ case write: V2TableWriteExec => findFinalRebalanceStage(write.child)
+ case write: WriteFilesExec => findFinalRebalanceStage(write.child)
+ case p: ProjectExec => findFinalRebalanceStage(p.child)
+ case f: FilterExec => findFinalRebalanceStage(f.child)
+ case s: SortExec if !s.global => findFinalRebalanceStage(s.child)
+ case stage: ShuffleQueryStageExec
+ if stage.isMaterialized && stage.mapStats.isDefined &&
+ stage.plan.isInstanceOf[ShuffleExchangeExec] &&
+ stage.plan.asInstanceOf[ShuffleExchangeExec].shuffleOrigin != ENSURE_REQUIREMENTS =>
+ Some(stage)
+ case _ => None
+ }
+ }
+
+ final protected def hasTableCache(plan: SparkPlan): Boolean = {
+ find(plan) {
+ case _: InMemoryTableScanExec => true
+ case _ => false
+ }.isDefined
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/InjectCustomResourceProfile.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/InjectCustomResourceProfile.scala
new file mode 100644
index 00000000000..64421d6bfab
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/InjectCustomResourceProfile.scala
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.{CustomResourceProfileExec, SparkPlan}
+import org.apache.spark.sql.execution.adaptive._
+
+import org.apache.kyuubi.sql.{KyuubiSQLConf, WriteUtils}
+
+/**
+ * Inject custom resource profile for final write stage, so we can specify custom
+ * executor resource configs.
+ */
+case class InjectCustomResourceProfile(session: SparkSession)
+ extends Rule[SparkPlan] with FinalRebalanceStageHelper {
+ override def apply(plan: SparkPlan): SparkPlan = {
+ if (!conf.getConf(KyuubiSQLConf.FINAL_WRITE_STAGE_RESOURCE_ISOLATION_ENABLED)) {
+ return plan
+ }
+
+ if (!WriteUtils.isWrite(session, plan)) {
+ return plan
+ }
+
+ val stage = findFinalRebalanceStage(plan)
+ if (stage.isEmpty) {
+ return plan
+ }
+
+ // TODO: Ideally, We can call `CoarseGrainedSchedulerBackend.requestTotalExecutors` eagerly
+ // to reduce the task submit pending time, but it may lose task locality.
+ //
+ // By default, it would request executors when catch stage submit event.
+ injectCustomResourceProfile(plan, stage.get.id)
+ }
+
+ private def injectCustomResourceProfile(plan: SparkPlan, id: Int): SparkPlan = {
+ plan match {
+ case stage: ShuffleQueryStageExec if stage.id == id =>
+ CustomResourceProfileExec(stage)
+ case _ => plan.mapChildren(child => injectCustomResourceProfile(child, id))
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/PruneFileSourcePartitionHelper.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/PruneFileSourcePartitionHelper.scala
new file mode 100644
index 00000000000..ce496eb474c
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/PruneFileSourcePartitionHelper.scala
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, AttributeSet, Expression, ExpressionSet, PredicateHelper, SubqueryExpression}
+import org.apache.spark.sql.catalyst.plans.logical.LeafNode
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.types.StructType
+
+trait PruneFileSourcePartitionHelper extends PredicateHelper {
+
+ def getPartitionKeyFiltersAndDataFilters(
+ sparkSession: SparkSession,
+ relation: LeafNode,
+ partitionSchema: StructType,
+ filters: Seq[Expression],
+ output: Seq[AttributeReference]): (ExpressionSet, Seq[Expression]) = {
+ val normalizedFilters = DataSourceStrategy.normalizeExprs(
+ filters.filter(f => f.deterministic && !SubqueryExpression.hasSubquery(f)),
+ output)
+ val partitionColumns =
+ relation.resolve(partitionSchema, sparkSession.sessionState.analyzer.resolver)
+ val partitionSet = AttributeSet(partitionColumns)
+ val (partitionFilters, dataFilters) = normalizedFilters.partition(f =>
+ f.references.subsetOf(partitionSet))
+ val extraPartitionFilter =
+ dataFilters.flatMap(extractPredicatesWithinOutputSet(_, partitionSet))
+
+ (ExpressionSet(partitionFilters ++ extraPartitionFilter), dataFilters)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/execution/CustomResourceProfileExec.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/execution/CustomResourceProfileExec.scala
new file mode 100644
index 00000000000..3698140fbd0
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/spark/sql/execution/CustomResourceProfileExec.scala
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.network.util.{ByteUnit, JavaUtils}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.resource.{ExecutorResourceRequests, ResourceProfileBuilder}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.Partitioning
+import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
+import org.apache.spark.sql.vectorized.ColumnarBatch
+import org.apache.spark.util.Utils
+
+import org.apache.kyuubi.sql.KyuubiSQLConf._
+
+/**
+ * This node wraps the final executed plan and inject custom resource profile to the RDD.
+ * It assumes that, the produced RDD would create the `ResultStage` in `DAGScheduler`,
+ * so it makes resource isolation between previous and final stage.
+ *
+ * Note that, Spark does not support config `minExecutors` for each resource profile.
+ * Which means, it would retain `minExecutors` for each resource profile.
+ * So, suggest set `spark.dynamicAllocation.minExecutors` to 0 if enable this feature.
+ */
+case class CustomResourceProfileExec(child: SparkPlan) extends UnaryExecNode {
+ override def output: Seq[Attribute] = child.output
+ override def outputPartitioning: Partitioning = child.outputPartitioning
+ override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+ override def supportsColumnar: Boolean = child.supportsColumnar
+ override def supportsRowBased: Boolean = child.supportsRowBased
+ override protected def doCanonicalize(): SparkPlan = child.canonicalized
+
+ private val executorCores = conf.getConf(FINAL_WRITE_STAGE_EXECUTOR_CORES).getOrElse(
+ sparkContext.getConf.getInt("spark.executor.cores", 1))
+ private val executorMemory = conf.getConf(FINAL_WRITE_STAGE_EXECUTOR_MEMORY).getOrElse(
+ sparkContext.getConf.get("spark.executor.memory", "2G"))
+ private val executorMemoryOverhead =
+ conf.getConf(FINAL_WRITE_STAGE_EXECUTOR_MEMORY_OVERHEAD)
+ .getOrElse(sparkContext.getConf.get("spark.executor.memoryOverhead", "1G"))
+ private val executorOffHeapMemory = conf.getConf(FINAL_WRITE_STAGE_EXECUTOR_OFF_HEAP_MEMORY)
+
+ override lazy val metrics: Map[String, SQLMetric] = {
+ val base = Map(
+ "executorCores" -> SQLMetrics.createMetric(sparkContext, "executor cores"),
+ "executorMemory" -> SQLMetrics.createMetric(sparkContext, "executor memory (MiB)"),
+ "executorMemoryOverhead" -> SQLMetrics.createMetric(
+ sparkContext,
+ "executor memory overhead (MiB)"))
+ val addition = executorOffHeapMemory.map(_ =>
+ "executorOffHeapMemory" ->
+ SQLMetrics.createMetric(sparkContext, "executor off heap memory (MiB)")).toMap
+ base ++ addition
+ }
+
+ private def wrapResourceProfile[T](rdd: RDD[T]): RDD[T] = {
+ if (Utils.isTesting) {
+ // do nothing for local testing
+ return rdd
+ }
+
+ metrics("executorCores") += executorCores
+ metrics("executorMemory") += JavaUtils.byteStringAs(executorMemory, ByteUnit.MiB)
+ metrics("executorMemoryOverhead") += JavaUtils.byteStringAs(
+ executorMemoryOverhead,
+ ByteUnit.MiB)
+ executorOffHeapMemory.foreach(m =>
+ metrics("executorOffHeapMemory") += JavaUtils.byteStringAs(m, ByteUnit.MiB))
+
+ val executionId = sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
+ SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, metrics.values.toSeq)
+
+ val resourceProfileBuilder = new ResourceProfileBuilder()
+ val executorResourceRequests = new ExecutorResourceRequests()
+ executorResourceRequests.cores(executorCores)
+ executorResourceRequests.memory(executorMemory)
+ executorResourceRequests.memoryOverhead(executorMemoryOverhead)
+ executorOffHeapMemory.foreach(executorResourceRequests.offHeapMemory)
+ resourceProfileBuilder.require(executorResourceRequests)
+ rdd.withResources(resourceProfileBuilder.build())
+ rdd
+ }
+
+ override protected def doExecute(): RDD[InternalRow] = {
+ val rdd = child.execute()
+ wrapResourceProfile(rdd)
+ }
+
+ override protected def doExecuteColumnar(): RDD[ColumnarBatch] = {
+ val rdd = child.executeColumnar()
+ wrapResourceProfile(rdd)
+ }
+
+ override protected def withNewChildInternal(newChild: SparkPlan): SparkPlan = {
+ this.copy(child = newChild)
+ }
+}
diff --git a/extensions/spark/kyuubi-spark-connector-kudu/src/test/resources/log4j2-test.xml b/extensions/spark/kyuubi-extension-spark-3-4/src/test/resources/log4j2-test.xml
similarity index 100%
rename from extensions/spark/kyuubi-spark-connector-kudu/src/test/resources/log4j2-test.xml
rename to extensions/spark/kyuubi-extension-spark-3-4/src/test/resources/log4j2-test.xml
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/DropIgnoreNonexistentSuite.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/DropIgnoreNonexistentSuite.scala
new file mode 100644
index 00000000000..bbc61fb4408
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/DropIgnoreNonexistentSuite.scala
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.plans.logical.{DropNamespace, NoopCommand}
+import org.apache.spark.sql.execution.command._
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+
+class DropIgnoreNonexistentSuite extends KyuubiSparkSQLExtensionTest {
+
+ test("drop ignore nonexistent") {
+ withSQLConf(KyuubiSQLConf.DROP_IGNORE_NONEXISTENT.key -> "true") {
+ // drop nonexistent database
+ val df1 = sql("DROP DATABASE nonexistent_database")
+ assert(df1.queryExecution.analyzed.asInstanceOf[DropNamespace].ifExists == true)
+
+ // drop nonexistent function
+ val df4 = sql("DROP FUNCTION nonexistent_function")
+ assert(df4.queryExecution.analyzed.isInstanceOf[NoopCommand])
+
+ // drop nonexistent PARTITION
+ withTable("test") {
+ sql("CREATE TABLE IF NOT EXISTS test(i int) PARTITIONED BY (p int)")
+ val df5 = sql("ALTER TABLE test DROP PARTITION (p = 1)")
+ assert(df5.queryExecution.analyzed
+ .asInstanceOf[AlterTableDropPartitionCommand].ifExists == true)
+ }
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/FinalStageConfigIsolationSuite.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/FinalStageConfigIsolationSuite.scala
new file mode 100644
index 00000000000..96c8ae6e8b0
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/FinalStageConfigIsolationSuite.scala
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.execution.adaptive.{AQEShuffleReadExec, QueryStageExec}
+import org.apache.spark.sql.internal.SQLConf
+
+import org.apache.kyuubi.sql.{FinalStageConfigIsolation, KyuubiSQLConf}
+
+class FinalStageConfigIsolationSuite extends KyuubiSparkSQLExtensionTest {
+ override protected def beforeAll(): Unit = {
+ super.beforeAll()
+ setupData()
+ }
+
+ test("final stage config set reset check") {
+ withSQLConf(
+ KyuubiSQLConf.FINAL_STAGE_CONFIG_ISOLATION.key -> "true",
+ KyuubiSQLConf.FINAL_STAGE_CONFIG_ISOLATION_WRITE_ONLY.key -> "false",
+ "spark.sql.finalStage.adaptive.coalescePartitions.minPartitionNum" -> "1",
+ "spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes" -> "100") {
+ // use loop to double check final stage config doesn't affect the sql query each other
+ (1 to 3).foreach { _ =>
+ sql("SELECT COUNT(*) FROM VALUES(1) as t(c)").collect()
+ assert(spark.sessionState.conf.getConfString(
+ "spark.sql.previousStage.adaptive.coalescePartitions.minPartitionNum") ===
+ FinalStageConfigIsolation.INTERNAL_UNSET_CONFIG_TAG)
+ assert(spark.sessionState.conf.getConfString(
+ "spark.sql.adaptive.coalescePartitions.minPartitionNum") ===
+ "1")
+ assert(spark.sessionState.conf.getConfString(
+ "spark.sql.finalStage.adaptive.coalescePartitions.minPartitionNum") ===
+ "1")
+
+ // 64MB
+ assert(spark.sessionState.conf.getConfString(
+ "spark.sql.previousStage.adaptive.advisoryPartitionSizeInBytes") ===
+ "67108864b")
+ assert(spark.sessionState.conf.getConfString(
+ "spark.sql.adaptive.advisoryPartitionSizeInBytes") ===
+ "100")
+ assert(spark.sessionState.conf.getConfString(
+ "spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes") ===
+ "100")
+ }
+
+ sql("SET spark.sql.adaptive.advisoryPartitionSizeInBytes=1")
+ assert(spark.sessionState.conf.getConfString(
+ "spark.sql.adaptive.advisoryPartitionSizeInBytes") ===
+ "1")
+ assert(!spark.sessionState.conf.contains(
+ "spark.sql.previousStage.adaptive.advisoryPartitionSizeInBytes"))
+
+ sql("SET a=1")
+ assert(spark.sessionState.conf.getConfString("a") === "1")
+
+ sql("RESET spark.sql.adaptive.coalescePartitions.minPartitionNum")
+ assert(!spark.sessionState.conf.contains(
+ "spark.sql.adaptive.coalescePartitions.minPartitionNum"))
+ assert(!spark.sessionState.conf.contains(
+ "spark.sql.previousStage.adaptive.coalescePartitions.minPartitionNum"))
+
+ sql("RESET a")
+ assert(!spark.sessionState.conf.contains("a"))
+ }
+ }
+
+ test("final stage config isolation") {
+ def checkPartitionNum(
+ sqlString: String,
+ previousPartitionNum: Int,
+ finalPartitionNum: Int): Unit = {
+ val df = sql(sqlString)
+ df.collect()
+ val shuffleReaders = collect(df.queryExecution.executedPlan) {
+ case customShuffleReader: AQEShuffleReadExec => customShuffleReader
+ }
+ assert(shuffleReaders.nonEmpty)
+ // reorder stage by stage id to ensure we get the right stage
+ val sortedShuffleReaders = shuffleReaders.sortWith {
+ case (s1, s2) =>
+ s1.child.asInstanceOf[QueryStageExec].id < s2.child.asInstanceOf[QueryStageExec].id
+ }
+ if (sortedShuffleReaders.length > 1) {
+ assert(sortedShuffleReaders.head.partitionSpecs.length === previousPartitionNum)
+ }
+ assert(sortedShuffleReaders.last.partitionSpecs.length === finalPartitionNum)
+ assert(df.rdd.partitions.length === finalPartitionNum)
+ }
+
+ withSQLConf(
+ SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1",
+ SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "3",
+ KyuubiSQLConf.FINAL_STAGE_CONFIG_ISOLATION.key -> "true",
+ KyuubiSQLConf.FINAL_STAGE_CONFIG_ISOLATION_WRITE_ONLY.key -> "false",
+ "spark.sql.adaptive.advisoryPartitionSizeInBytes" -> "1",
+ "spark.sql.adaptive.coalescePartitions.minPartitionSize" -> "1",
+ "spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes" -> "10000000") {
+
+ // use loop to double check final stage config doesn't affect the sql query each other
+ (1 to 3).foreach { _ =>
+ checkPartitionNum(
+ "SELECT c1, count(*) FROM t1 GROUP BY c1",
+ 1,
+ 1)
+
+ checkPartitionNum(
+ "SELECT c2, count(*) FROM (SELECT c1, count(*) as c2 FROM t1 GROUP BY c1) GROUP BY c2",
+ 3,
+ 1)
+
+ checkPartitionNum(
+ "SELECT t1.c1, count(*) FROM t1 JOIN t2 ON t1.c2 = t2.c2 GROUP BY t1.c1",
+ 3,
+ 1)
+
+ checkPartitionNum(
+ """
+ | SELECT /*+ REPARTITION */
+ | t1.c1, count(*) FROM t1
+ | JOIN t2 ON t1.c2 = t2.c2
+ | JOIN t3 ON t1.c1 = t3.c1
+ | GROUP BY t1.c1
+ |""".stripMargin,
+ 3,
+ 1)
+
+ // one shuffle reader
+ checkPartitionNum(
+ """
+ | SELECT /*+ BROADCAST(t1) */
+ | t1.c1, t2.c2 FROM t1
+ | JOIN t2 ON t1.c2 = t2.c2
+ | DISTRIBUTE BY c1
+ |""".stripMargin,
+ 1,
+ 1)
+
+ // test ReusedExchange
+ checkPartitionNum(
+ """
+ |SELECT /*+ REPARTITION */ t0.c2 FROM (
+ |SELECT t1.c1, (count(*) + c1) as c2 FROM t1 GROUP BY t1.c1
+ |) t0 JOIN (
+ |SELECT t1.c1, (count(*) + c1) as c2 FROM t1 GROUP BY t1.c1
+ |) t1 ON t0.c2 = t1.c2
+ |""".stripMargin,
+ 3,
+ 1)
+
+ // one shuffle reader
+ checkPartitionNum(
+ """
+ |SELECT t0.c1 FROM (
+ |SELECT t1.c1 FROM t1 GROUP BY t1.c1
+ |) t0 JOIN (
+ |SELECT t1.c1 FROM t1 GROUP BY t1.c1
+ |) t1 ON t0.c1 = t1.c1
+ |""".stripMargin,
+ 1,
+ 1)
+ }
+ }
+ }
+
+ test("final stage config isolation write only") {
+ withSQLConf(
+ KyuubiSQLConf.FINAL_STAGE_CONFIG_ISOLATION.key -> "true",
+ KyuubiSQLConf.FINAL_STAGE_CONFIG_ISOLATION_WRITE_ONLY.key -> "true",
+ "spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes" -> "7") {
+ sql("set spark.sql.adaptive.advisoryPartitionSizeInBytes=5")
+ sql("SELECT * FROM t1").count()
+ assert(spark.conf.getOption("spark.sql.adaptive.advisoryPartitionSizeInBytes")
+ .contains("5"))
+
+ withTable("tmp") {
+ sql("CREATE TABLE t1 USING PARQUET SELECT /*+ repartition */ 1 AS c1, 'a' AS c2")
+ assert(spark.conf.getOption("spark.sql.adaptive.advisoryPartitionSizeInBytes")
+ .contains("7"))
+ }
+
+ sql("SELECT * FROM t1").count()
+ assert(spark.conf.getOption("spark.sql.adaptive.advisoryPartitionSizeInBytes")
+ .contains("5"))
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/FinalStageResourceManagerSuite.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/FinalStageResourceManagerSuite.scala
new file mode 100644
index 00000000000..4b9991ef6f2
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/FinalStageResourceManagerSuite.scala
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkConf
+import org.scalatest.time.{Minutes, Span}
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+import org.apache.kyuubi.tags.SparkLocalClusterTest
+
+@SparkLocalClusterTest
+class FinalStageResourceManagerSuite extends KyuubiSparkSQLExtensionTest {
+
+ override def sparkConf(): SparkConf = {
+ // It is difficult to run spark in local-cluster mode when spark.testing is set.
+ sys.props.remove("spark.testing")
+
+ super.sparkConf().set("spark.master", "local-cluster[3, 1, 1024]")
+ .set("spark.dynamicAllocation.enabled", "true")
+ .set("spark.dynamicAllocation.initialExecutors", "3")
+ .set("spark.dynamicAllocation.minExecutors", "1")
+ .set("spark.dynamicAllocation.shuffleTracking.enabled", "true")
+ .set(KyuubiSQLConf.FINAL_STAGE_CONFIG_ISOLATION.key, "true")
+ .set(KyuubiSQLConf.FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_ENABLED.key, "true")
+ }
+
+ test("[KYUUBI #5136][Bug] Final Stage hangs forever") {
+ // Prerequisite to reproduce the bug:
+ // 1. Dynamic allocation is enabled.
+ // 2. Dynamic allocation min executors is 1.
+ // 3. target executors < active executors.
+ // 4. No active executor is left after FinalStageResourceManager killed executors.
+ // This is possible because FinalStageResourceManager retained executors may already be
+ // requested to be killed but not died yet.
+ // 5. Final Stage required executors is 1.
+ withSQLConf(
+ (KyuubiSQLConf.FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_KILL_ALL.key, "true")) {
+ withTable("final_stage") {
+ eventually(timeout(Span(10, Minutes))) {
+ sql(
+ "CREATE TABLE final_stage AS SELECT id, count(*) as num FROM (SELECT 0 id) GROUP BY id")
+ }
+ assert(FinalStageResourceManager.getAdjustedTargetExecutors(spark.sparkContext).get == 1)
+ }
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/InjectResourceProfileSuite.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/InjectResourceProfileSuite.scala
new file mode 100644
index 00000000000..b0767b18708
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/InjectResourceProfileSuite.scala
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.scheduler.{SparkListener, SparkListenerEvent}
+import org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecutionUpdate
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+
+class InjectResourceProfileSuite extends KyuubiSparkSQLExtensionTest {
+ private def checkCustomResourceProfile(sqlString: String, exists: Boolean): Unit = {
+ @volatile var lastEvent: SparkListenerSQLAdaptiveExecutionUpdate = null
+ val listener = new SparkListener {
+ override def onOtherEvent(event: SparkListenerEvent): Unit = {
+ event match {
+ case e: SparkListenerSQLAdaptiveExecutionUpdate => lastEvent = e
+ case _ =>
+ }
+ }
+ }
+
+ spark.sparkContext.addSparkListener(listener)
+ try {
+ sql(sqlString).collect()
+ spark.sparkContext.listenerBus.waitUntilEmpty()
+ assert(lastEvent != null)
+ var current = lastEvent.sparkPlanInfo
+ var shouldStop = false
+ while (!shouldStop) {
+ if (current.nodeName != "CustomResourceProfile") {
+ if (current.children.isEmpty) {
+ assert(!exists)
+ shouldStop = true
+ } else {
+ current = current.children.head
+ }
+ } else {
+ assert(exists)
+ shouldStop = true
+ }
+ }
+ } finally {
+ spark.sparkContext.removeSparkListener(listener)
+ }
+ }
+
+ test("Inject resource profile") {
+ withTable("t") {
+ withSQLConf(
+ "spark.sql.adaptive.forceApply" -> "true",
+ KyuubiSQLConf.FINAL_STAGE_CONFIG_ISOLATION.key -> "true",
+ KyuubiSQLConf.FINAL_WRITE_STAGE_RESOURCE_ISOLATION_ENABLED.key -> "true") {
+
+ sql("CREATE TABLE t (c1 int, c2 string) USING PARQUET")
+
+ checkCustomResourceProfile("INSERT INTO TABLE t VALUES(1, 'a')", false)
+ checkCustomResourceProfile("SELECT 1", false)
+ checkCustomResourceProfile(
+ "INSERT INTO TABLE t SELECT /*+ rebalance */ * FROM VALUES(1, 'a')",
+ true)
+ }
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/InsertShuffleNodeBeforeJoinSuite.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/InsertShuffleNodeBeforeJoinSuite.scala
new file mode 100644
index 00000000000..f0d38465734
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/InsertShuffleNodeBeforeJoinSuite.scala
@@ -0,0 +1,19 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+class InsertShuffleNodeBeforeJoinSuite extends InsertShuffleNodeBeforeJoinSuiteBase
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/InsertShuffleNodeBeforeJoinSuiteBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/InsertShuffleNodeBeforeJoinSuiteBase.scala
new file mode 100644
index 00000000000..c657dee49f3
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/InsertShuffleNodeBeforeJoinSuiteBase.scala
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.execution.exchange.{ENSURE_REQUIREMENTS, ShuffleExchangeLike}
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+
+trait InsertShuffleNodeBeforeJoinSuiteBase extends KyuubiSparkSQLExtensionTest {
+ override protected def beforeAll(): Unit = {
+ super.beforeAll()
+ setupData()
+ }
+
+ override def sparkConf(): SparkConf = {
+ super.sparkConf()
+ .set(
+ StaticSQLConf.SPARK_SESSION_EXTENSIONS.key,
+ "org.apache.kyuubi.sql.KyuubiSparkSQLCommonExtension")
+ }
+
+ test("force shuffle before join") {
+ def checkShuffleNodeNum(sqlString: String, num: Int): Unit = {
+ var expectedResult: Seq[Row] = Seq.empty
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false") {
+ expectedResult = sql(sqlString).collect()
+ }
+ val df = sql(sqlString)
+ checkAnswer(df, expectedResult)
+ assert(
+ collect(df.queryExecution.executedPlan) {
+ case shuffle: ShuffleExchangeLike if shuffle.shuffleOrigin == ENSURE_REQUIREMENTS =>
+ shuffle
+ }.size == num)
+ }
+
+ withSQLConf(
+ SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1",
+ KyuubiSQLConf.FORCE_SHUFFLE_BEFORE_JOIN.key -> "true") {
+ Seq("SHUFFLE_HASH", "MERGE").foreach { joinHint =>
+ // positive case
+ checkShuffleNodeNum(
+ s"""
+ |SELECT /*+ $joinHint(t2, t3) */ t1.c1, t1.c2, t2.c1, t3.c1 from t1
+ | JOIN t2 ON t1.c1 = t2.c1
+ | JOIN t3 ON t1.c1 = t3.c1
+ | """.stripMargin,
+ 4)
+
+ // negative case
+ checkShuffleNodeNum(
+ s"""
+ |SELECT /*+ $joinHint(t2, t3) */ t1.c1, t1.c2, t2.c1, t3.c1 from t1
+ | JOIN t2 ON t1.c1 = t2.c1
+ | JOIN t3 ON t1.c2 = t3.c2
+ | """.stripMargin,
+ 4)
+ }
+
+ checkShuffleNodeNum(
+ """
+ |SELECT t1.c1, t2.c1, t3.c2 from t1
+ | JOIN t2 ON t1.c1 = t2.c1
+ | JOIN (
+ | SELECT c2, count(*) FROM t1 GROUP BY c2
+ | ) t3 ON t1.c1 = t3.c2
+ | """.stripMargin,
+ 5)
+
+ checkShuffleNodeNum(
+ """
+ |SELECT t1.c1, t2.c1, t3.c1 from t1
+ | JOIN t2 ON t1.c1 = t2.c1
+ | JOIN (
+ | SELECT c1, count(*) FROM t1 GROUP BY c1
+ | ) t3 ON t1.c1 = t3.c1
+ | """.stripMargin,
+ 5)
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/KyuubiSparkSQLExtensionTest.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/KyuubiSparkSQLExtensionTest.scala
new file mode 100644
index 00000000000..dd9ffbf169e
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/KyuubiSparkSQLExtensionTest.scala
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.execution.QueryExecution
+import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
+import org.apache.spark.sql.execution.command.{DataWritingCommand, DataWritingCommandExec}
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.sql.test.SQLTestData.TestData
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.sql.util.QueryExecutionListener
+import org.apache.spark.util.Utils
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+
+trait KyuubiSparkSQLExtensionTest extends QueryTest
+ with SQLTestUtils
+ with AdaptiveSparkPlanHelper {
+ sys.props.put("spark.testing", "1")
+
+ private var _spark: Option[SparkSession] = None
+ protected def spark: SparkSession = _spark.getOrElse {
+ throw new RuntimeException("test spark session don't initial before using it.")
+ }
+
+ override protected def beforeAll(): Unit = {
+ if (_spark.isEmpty) {
+ _spark = Option(SparkSession.builder()
+ .master("local[1]")
+ .config(sparkConf)
+ .enableHiveSupport()
+ .getOrCreate())
+ }
+ super.beforeAll()
+ }
+
+ override protected def afterAll(): Unit = {
+ super.afterAll()
+ cleanupData()
+ _spark.foreach(_.stop)
+ }
+
+ protected def setupData(): Unit = {
+ val self = spark
+ import self.implicits._
+ spark.sparkContext.parallelize(
+ (1 to 100).map(i => TestData(i, i.toString)),
+ 10)
+ .toDF("c1", "c2").createOrReplaceTempView("t1")
+ spark.sparkContext.parallelize(
+ (1 to 10).map(i => TestData(i, i.toString)),
+ 5)
+ .toDF("c1", "c2").createOrReplaceTempView("t2")
+ spark.sparkContext.parallelize(
+ (1 to 50).map(i => TestData(i, i.toString)),
+ 2)
+ .toDF("c1", "c2").createOrReplaceTempView("t3")
+ }
+
+ private def cleanupData(): Unit = {
+ spark.sql("DROP VIEW IF EXISTS t1")
+ spark.sql("DROP VIEW IF EXISTS t2")
+ spark.sql("DROP VIEW IF EXISTS t3")
+ }
+
+ def sparkConf(): SparkConf = {
+ val basePath = Utils.createTempDir() + "/" + getClass.getCanonicalName
+ val metastorePath = basePath + "/metastore_db"
+ val warehousePath = basePath + "/warehouse"
+ new SparkConf()
+ .set(
+ StaticSQLConf.SPARK_SESSION_EXTENSIONS.key,
+ "org.apache.kyuubi.sql.KyuubiSparkSQLExtension")
+ .set(KyuubiSQLConf.SQL_CLASSIFICATION_ENABLED.key, "true")
+ .set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "true")
+ .set("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict")
+ .set("spark.hadoop.hive.metastore.client.capability.check", "false")
+ .set(
+ ConfVars.METASTORECONNECTURLKEY.varname,
+ s"jdbc:derby:;databaseName=$metastorePath;create=true")
+ .set(StaticSQLConf.WAREHOUSE_PATH, warehousePath)
+ .set("spark.ui.enabled", "false")
+ }
+
+ def withListener(sqlString: String)(callback: DataWritingCommand => Unit): Unit = {
+ withListener(sql(sqlString))(callback)
+ }
+
+ def withListener(df: => DataFrame)(callback: DataWritingCommand => Unit): Unit = {
+ val listener = new QueryExecutionListener {
+ override def onFailure(f: String, qe: QueryExecution, e: Exception): Unit = {}
+
+ override def onSuccess(funcName: String, qe: QueryExecution, duration: Long): Unit = {
+ qe.executedPlan match {
+ case write: DataWritingCommandExec => callback(write.cmd)
+ case _ =>
+ }
+ }
+ }
+ spark.listenerManager.register(listener)
+ try {
+ df.collect()
+ sparkContext.listenerBus.waitUntilEmpty()
+ } finally {
+ spark.listenerManager.unregister(listener)
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/RebalanceBeforeWritingSuite.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/RebalanceBeforeWritingSuite.scala
new file mode 100644
index 00000000000..1d9630f4937
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/RebalanceBeforeWritingSuite.scala
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, RebalancePartitions, Sort}
+import org.apache.spark.sql.execution.command.DataWritingCommand
+import org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
+import org.apache.spark.sql.hive.HiveUtils
+import org.apache.spark.sql.hive.execution.InsertIntoHiveTable
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+
+class RebalanceBeforeWritingSuite extends KyuubiSparkSQLExtensionTest {
+
+ test("check rebalance exists") {
+ def check(df: => DataFrame, expectedRebalanceNum: Int = 1): Unit = {
+ withSQLConf(KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE_IF_NO_SHUFFLE.key -> "true") {
+ withListener(df) { write =>
+ assert(write.collect {
+ case r: RebalancePartitions => r
+ }.size == expectedRebalanceNum)
+ }
+ }
+ withSQLConf(KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE_IF_NO_SHUFFLE.key -> "false") {
+ withListener(df) { write =>
+ assert(write.collect {
+ case r: RebalancePartitions => r
+ }.isEmpty)
+ }
+ }
+ }
+
+ // It's better to set config explicitly in case of we change the default value.
+ withSQLConf(KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE.key -> "true") {
+ Seq("USING PARQUET", "").foreach { storage =>
+ withTable("tmp1") {
+ sql(s"CREATE TABLE tmp1 (c1 int) $storage PARTITIONED BY (c2 string)")
+ check(sql("INSERT INTO TABLE tmp1 PARTITION(c2='a') " +
+ "SELECT * FROM VALUES(1),(2) AS t(c1)"))
+ }
+
+ withTable("tmp1", "tmp2") {
+ sql(s"CREATE TABLE tmp1 (c1 int) $storage PARTITIONED BY (c2 string)")
+ sql(s"CREATE TABLE tmp2 (c1 int) $storage PARTITIONED BY (c2 string)")
+ check(
+ sql(
+ """FROM VALUES(1),(2)
+ |INSERT INTO TABLE tmp1 PARTITION(c2='a') SELECT *
+ |INSERT INTO TABLE tmp2 PARTITION(c2='a') SELECT *
+ |""".stripMargin),
+ 2)
+ }
+
+ withTable("tmp1") {
+ sql(s"CREATE TABLE tmp1 (c1 int) $storage")
+ check(sql("INSERT INTO TABLE tmp1 SELECT * FROM VALUES(1),(2),(3) AS t(c1)"))
+ }
+
+ withTable("tmp1", "tmp2") {
+ sql(s"CREATE TABLE tmp1 (c1 int) $storage")
+ sql(s"CREATE TABLE tmp2 (c1 int) $storage")
+ check(
+ sql(
+ """FROM VALUES(1),(2),(3)
+ |INSERT INTO TABLE tmp1 SELECT *
+ |INSERT INTO TABLE tmp2 SELECT *
+ |""".stripMargin),
+ 2)
+ }
+
+ withTable("tmp1") {
+ sql(s"CREATE TABLE tmp1 $storage AS SELECT * FROM VALUES(1),(2),(3) AS t(c1)")
+ }
+
+ withTable("tmp1") {
+ sql(s"CREATE TABLE tmp1 $storage PARTITIONED BY(c2) AS " +
+ s"SELECT * FROM VALUES(1, 'a'),(2, 'b') AS t(c1, c2)")
+ }
+ }
+ }
+ }
+
+ test("check rebalance does not exists") {
+ def check(df: DataFrame): Unit = {
+ withListener(df) { write =>
+ assert(write.collect {
+ case r: RebalancePartitions => r
+ }.isEmpty)
+ }
+ }
+
+ withSQLConf(
+ KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE.key -> "true",
+ KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE_IF_NO_SHUFFLE.key -> "true") {
+ // test no write command
+ check(sql("SELECT * FROM VALUES(1, 'a'),(2, 'b') AS t(c1, c2)"))
+ check(sql("SELECT count(*) FROM VALUES(1, 'a'),(2, 'b') AS t(c1, c2)"))
+
+ // test not supported plan
+ withTable("tmp1") {
+ sql(s"CREATE TABLE tmp1 (c1 int) PARTITIONED BY (c2 string)")
+ check(sql("INSERT INTO TABLE tmp1 PARTITION(c2) " +
+ "SELECT /*+ repartition(10) */ * FROM VALUES(1, 'a'),(2, 'b') AS t(c1, c2)"))
+ check(sql("INSERT INTO TABLE tmp1 PARTITION(c2) " +
+ "SELECT * FROM VALUES(1, 'a'),(2, 'b') AS t(c1, c2) ORDER BY c1"))
+ check(sql("INSERT INTO TABLE tmp1 PARTITION(c2) " +
+ "SELECT * FROM VALUES(1, 'a'),(2, 'b') AS t(c1, c2) LIMIT 10"))
+ }
+ }
+
+ withSQLConf(KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE.key -> "false") {
+ Seq("USING PARQUET", "").foreach { storage =>
+ withTable("tmp1") {
+ sql(s"CREATE TABLE tmp1 (c1 int) $storage PARTITIONED BY (c2 string)")
+ check(sql("INSERT INTO TABLE tmp1 PARTITION(c2) " +
+ "SELECT * FROM VALUES(1, 'a'),(2, 'b') AS t(c1, c2)"))
+ }
+
+ withTable("tmp1") {
+ sql(s"CREATE TABLE tmp1 (c1 int) $storage")
+ check(sql("INSERT INTO TABLE tmp1 SELECT * FROM VALUES(1),(2),(3) AS t(c1)"))
+ }
+ }
+ }
+ }
+
+ test("test dynamic partition write") {
+ def checkRepartitionExpression(sqlString: String): Unit = {
+ withListener(sqlString) { write =>
+ assert(write.isInstanceOf[InsertIntoHiveTable])
+ assert(write.collect {
+ case r: RebalancePartitions if r.partitionExpressions.size == 1 =>
+ assert(r.partitionExpressions.head.asInstanceOf[Attribute].name === "c2")
+ r
+ }.size == 1)
+ }
+ }
+
+ withSQLConf(
+ KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE.key -> "true",
+ KyuubiSQLConf.DYNAMIC_PARTITION_INSERTION_REPARTITION_NUM.key -> "2",
+ KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE_IF_NO_SHUFFLE.key -> "true") {
+ Seq("USING PARQUET", "").foreach { storage =>
+ withTable("tmp1") {
+ sql(s"CREATE TABLE tmp1 (c1 int) $storage PARTITIONED BY (c2 string)")
+ checkRepartitionExpression("INSERT INTO TABLE tmp1 SELECT 1 as c1, 'a' as c2 ")
+ }
+
+ withTable("tmp1") {
+ checkRepartitionExpression(
+ "CREATE TABLE tmp1 PARTITIONED BY(C2) SELECT 1 as c1, 'a' as c2")
+ }
+ }
+ }
+ }
+
+ test("OptimizedCreateHiveTableAsSelectCommand") {
+ withSQLConf(
+ HiveUtils.CONVERT_METASTORE_PARQUET.key -> "true",
+ HiveUtils.CONVERT_METASTORE_CTAS.key -> "true",
+ KyuubiSQLConf.INSERT_REPARTITION_BEFORE_WRITE_IF_NO_SHUFFLE.key -> "true") {
+ withTable("t") {
+ withListener("CREATE TABLE t STORED AS parquet AS SELECT 1 as a") { write =>
+ assert(write.isInstanceOf[InsertIntoHadoopFsRelationCommand])
+ assert(write.collect {
+ case _: RebalancePartitions => true
+ }.size == 1)
+ }
+ }
+ }
+ }
+
+ test("Infer rebalance and sorder orders") {
+ def checkShuffleAndSort(dataWritingCommand: LogicalPlan, sSize: Int, rSize: Int): Unit = {
+ assert(dataWritingCommand.isInstanceOf[DataWritingCommand])
+ val plan = dataWritingCommand.asInstanceOf[DataWritingCommand].query
+ assert(plan.collect {
+ case s: Sort => s
+ }.size == sSize)
+ assert(plan.collect {
+ case r: RebalancePartitions if r.partitionExpressions.size == rSize => r
+ }.nonEmpty || rSize == 0)
+ }
+
+ withView("v") {
+ withTable("t", "input1", "input2") {
+ withSQLConf(KyuubiSQLConf.INFER_REBALANCE_AND_SORT_ORDERS.key -> "true") {
+ sql(s"CREATE TABLE t (c1 int, c2 long) USING PARQUET PARTITIONED BY (p string)")
+ sql(s"CREATE TABLE input1 USING PARQUET AS SELECT * FROM VALUES(1,2),(1,3)")
+ sql(s"CREATE TABLE input2 USING PARQUET AS SELECT * FROM VALUES(1,3),(1,3)")
+ sql(s"CREATE VIEW v as SELECT col1, count(*) as col2 FROM input1 GROUP BY col1")
+
+ val df0 = sql(
+ s"""
+ |INSERT INTO TABLE t PARTITION(p='a')
+ |SELECT /*+ broadcast(input2) */ input1.col1, input2.col1
+ |FROM input1
+ |JOIN input2
+ |ON input1.col1 = input2.col1
+ |""".stripMargin)
+ checkShuffleAndSort(df0.queryExecution.analyzed, 1, 1)
+
+ val df1 = sql(
+ s"""
+ |INSERT INTO TABLE t PARTITION(p='a')
+ |SELECT /*+ broadcast(input2) */ input1.col1, input1.col2
+ |FROM input1
+ |LEFT JOIN input2
+ |ON input1.col1 = input2.col1 and input1.col2 = input2.col2
+ |""".stripMargin)
+ checkShuffleAndSort(df1.queryExecution.analyzed, 1, 2)
+
+ val df2 = sql(
+ s"""
+ |INSERT INTO TABLE t PARTITION(p='a')
+ |SELECT col1 as c1, count(*) as c2
+ |FROM input1
+ |GROUP BY col1
+ |HAVING count(*) > 0
+ |""".stripMargin)
+ checkShuffleAndSort(df2.queryExecution.analyzed, 1, 1)
+
+ // dynamic partition
+ val df3 = sql(
+ s"""
+ |INSERT INTO TABLE t PARTITION(p)
+ |SELECT /*+ broadcast(input2) */ input1.col1, input1.col2, input1.col2
+ |FROM input1
+ |JOIN input2
+ |ON input1.col1 = input2.col1
+ |""".stripMargin)
+ checkShuffleAndSort(df3.queryExecution.analyzed, 0, 1)
+
+ // non-deterministic
+ val df4 = sql(
+ s"""
+ |INSERT INTO TABLE t PARTITION(p='a')
+ |SELECT col1 + rand(), count(*) as c2
+ |FROM input1
+ |GROUP BY col1
+ |""".stripMargin)
+ checkShuffleAndSort(df4.queryExecution.analyzed, 0, 0)
+
+ // view
+ val df5 = sql(
+ s"""
+ |INSERT INTO TABLE t PARTITION(p='a')
+ |SELECT * FROM v
+ |""".stripMargin)
+ checkShuffleAndSort(df5.queryExecution.analyzed, 1, 1)
+ }
+ }
+ }
+ }
+}
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiEvent.java b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/WatchDogSuite.scala
similarity index 91%
rename from kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiEvent.java
rename to extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/WatchDogSuite.scala
index 8de12508914..957089340ca 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiEvent.java
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/WatchDogSuite.scala
@@ -15,6 +15,6 @@
* limitations under the License.
*/
-package org.apache.kyuubi.client.api.v1.dto;
+package org.apache.spark.sql
-public interface KyuubiEvent {}
+class WatchDogSuite extends WatchDogSuiteBase {}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/WatchDogSuiteBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/WatchDogSuiteBase.scala
new file mode 100644
index 00000000000..a202e813c5e
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/WatchDogSuiteBase.scala
@@ -0,0 +1,601 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.commons.io.FileUtils
+import org.apache.spark.sql.catalyst.plans.logical.{GlobalLimit, LogicalPlan}
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+import org.apache.kyuubi.sql.watchdog.{MaxFileSizeExceedException, MaxPartitionExceedException}
+
+trait WatchDogSuiteBase extends KyuubiSparkSQLExtensionTest {
+ override protected def beforeAll(): Unit = {
+ super.beforeAll()
+ setupData()
+ }
+
+ case class LimitAndExpected(limit: Int, expected: Int)
+
+ val limitAndExpecteds = List(LimitAndExpected(1, 1), LimitAndExpected(11, 10))
+
+ private def checkMaxPartition: Unit = {
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_PARTITIONS.key -> "100") {
+ checkAnswer(sql("SELECT count(distinct(p)) FROM test"), Row(10) :: Nil)
+ }
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_PARTITIONS.key -> "5") {
+ sql("SELECT * FROM test where p=1").queryExecution.sparkPlan
+
+ sql(s"SELECT * FROM test WHERE p in (${Range(0, 5).toList.mkString(",")})")
+ .queryExecution.sparkPlan
+
+ intercept[MaxPartitionExceedException](
+ sql("SELECT * FROM test where p != 1").queryExecution.sparkPlan)
+
+ intercept[MaxPartitionExceedException](
+ sql("SELECT * FROM test").queryExecution.sparkPlan)
+
+ intercept[MaxPartitionExceedException](sql(
+ s"SELECT * FROM test WHERE p in (${Range(0, 6).toList.mkString(",")})")
+ .queryExecution.sparkPlan)
+ }
+ }
+
+ test("watchdog with scan maxPartitions -- hive") {
+ Seq("textfile", "parquet").foreach { format =>
+ withTable("test", "temp") {
+ sql(
+ s"""
+ |CREATE TABLE test(i int)
+ |PARTITIONED BY (p int)
+ |STORED AS $format""".stripMargin)
+ spark.range(0, 10, 1).selectExpr("id as col")
+ .createOrReplaceTempView("temp")
+
+ for (part <- Range(0, 10)) {
+ sql(
+ s"""
+ |INSERT OVERWRITE TABLE test PARTITION (p='$part')
+ |select col from temp""".stripMargin)
+ }
+ checkMaxPartition
+ }
+ }
+ }
+
+ test("watchdog with scan maxPartitions -- data source") {
+ withTempDir { dir =>
+ withTempView("test") {
+ spark.range(10).selectExpr("id", "id as p")
+ .write
+ .partitionBy("p")
+ .mode("overwrite")
+ .save(dir.getCanonicalPath)
+ spark.read.load(dir.getCanonicalPath).createOrReplaceTempView("test")
+ checkMaxPartition
+ }
+ }
+ }
+
+ test("test watchdog: simple SELECT STATEMENT") {
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "10") {
+
+ List("", "ORDER BY c1", "ORDER BY c2").foreach { sort =>
+ List("", " DISTINCT").foreach { distinct =>
+ assert(sql(
+ s"""
+ |SELECT $distinct *
+ |FROM t1
+ |$sort
+ |""".stripMargin).queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+ }
+ }
+
+ limitAndExpecteds.foreach { case LimitAndExpected(limit, expected) =>
+ List("", "ORDER BY c1", "ORDER BY c2").foreach { sort =>
+ List("", "DISTINCT").foreach { distinct =>
+ assert(sql(
+ s"""
+ |SELECT $distinct *
+ |FROM t1
+ |$sort
+ |LIMIT $limit
+ |""".stripMargin).queryExecution.optimizedPlan.maxRows.contains(expected))
+ }
+ }
+ }
+ }
+ }
+
+ test("test watchdog: SELECT ... WITH AGGREGATE STATEMENT ") {
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "10") {
+
+ assert(!sql("SELECT count(*) FROM t1")
+ .queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+
+ val sorts = List("", "ORDER BY cnt", "ORDER BY c1", "ORDER BY cnt, c1", "ORDER BY c1, cnt")
+ val havingConditions = List("", "HAVING cnt > 1")
+
+ havingConditions.foreach { having =>
+ sorts.foreach { sort =>
+ assert(sql(
+ s"""
+ |SELECT c1, COUNT(*) as cnt
+ |FROM t1
+ |GROUP BY c1
+ |$having
+ |$sort
+ |""".stripMargin).queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+ }
+ }
+
+ limitAndExpecteds.foreach { case LimitAndExpected(limit, expected) =>
+ havingConditions.foreach { having =>
+ sorts.foreach { sort =>
+ assert(sql(
+ s"""
+ |SELECT c1, COUNT(*) as cnt
+ |FROM t1
+ |GROUP BY c1
+ |$having
+ |$sort
+ |LIMIT $limit
+ |""".stripMargin).queryExecution.optimizedPlan.maxRows.contains(expected))
+ }
+ }
+ }
+ }
+ }
+
+ test("test watchdog: SELECT with CTE forceMaxOutputRows") {
+ // simple CTE
+ val q1 =
+ """
+ |WITH t2 AS (
+ | SELECT * FROM t1
+ |)
+ |""".stripMargin
+
+ // nested CTE
+ val q2 =
+ """
+ |WITH
+ | t AS (SELECT * FROM t1),
+ | t2 AS (
+ | WITH t3 AS (SELECT * FROM t1)
+ | SELECT * FROM t3
+ | )
+ |""".stripMargin
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "10") {
+
+ val sorts = List("", "ORDER BY c1", "ORDER BY c2")
+
+ sorts.foreach { sort =>
+ Seq(q1, q2).foreach { withQuery =>
+ assert(sql(
+ s"""
+ |$withQuery
+ |SELECT * FROM t2
+ |$sort
+ |""".stripMargin).queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+ }
+ }
+
+ limitAndExpecteds.foreach { case LimitAndExpected(limit, expected) =>
+ sorts.foreach { sort =>
+ Seq(q1, q2).foreach { withQuery =>
+ assert(sql(
+ s"""
+ |$withQuery
+ |SELECT * FROM t2
+ |$sort
+ |LIMIT $limit
+ |""".stripMargin).queryExecution.optimizedPlan.maxRows.contains(expected))
+ }
+ }
+ }
+ }
+ }
+
+ test("test watchdog: SELECT AGGREGATE WITH CTE forceMaxOutputRows") {
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "10") {
+
+ assert(!sql(
+ """
+ |WITH custom_cte AS (
+ |SELECT * FROM t1
+ |)
+ |
+ |SELECT COUNT(*)
+ |FROM custom_cte
+ |""".stripMargin).queryExecution
+ .analyzed.isInstanceOf[GlobalLimit])
+
+ val sorts = List("", "ORDER BY cnt", "ORDER BY c1", "ORDER BY cnt, c1", "ORDER BY c1, cnt")
+ val havingConditions = List("", "HAVING cnt > 1")
+
+ havingConditions.foreach { having =>
+ sorts.foreach { sort =>
+ assert(sql(
+ s"""
+ |WITH custom_cte AS (
+ |SELECT * FROM t1
+ |)
+ |
+ |SELECT c1, COUNT(*) as cnt
+ |FROM custom_cte
+ |GROUP BY c1
+ |$having
+ |$sort
+ |""".stripMargin).queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+ }
+ }
+
+ limitAndExpecteds.foreach { case LimitAndExpected(limit, expected) =>
+ havingConditions.foreach { having =>
+ sorts.foreach { sort =>
+ assert(sql(
+ s"""
+ |WITH custom_cte AS (
+ |SELECT * FROM t1
+ |)
+ |
+ |SELECT c1, COUNT(*) as cnt
+ |FROM custom_cte
+ |GROUP BY c1
+ |$having
+ |$sort
+ |LIMIT $limit
+ |""".stripMargin).queryExecution.optimizedPlan.maxRows.contains(expected))
+ }
+ }
+ }
+ }
+ }
+
+ test("test watchdog: UNION Statement for forceMaxOutputRows") {
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "10") {
+
+ List("", "ALL").foreach { x =>
+ assert(sql(
+ s"""
+ |SELECT c1, c2 FROM t1
+ |UNION $x
+ |SELECT c1, c2 FROM t2
+ |UNION $x
+ |SELECT c1, c2 FROM t3
+ |""".stripMargin)
+ .queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+ }
+
+ val sorts = List("", "ORDER BY cnt", "ORDER BY c1", "ORDER BY cnt, c1", "ORDER BY c1, cnt")
+ val havingConditions = List("", "HAVING cnt > 1")
+
+ List("", "ALL").foreach { x =>
+ havingConditions.foreach { having =>
+ sorts.foreach { sort =>
+ assert(sql(
+ s"""
+ |SELECT c1, count(c2) as cnt
+ |FROM t1
+ |GROUP BY c1
+ |$having
+ |UNION $x
+ |SELECT c1, COUNT(c2) as cnt
+ |FROM t2
+ |GROUP BY c1
+ |$having
+ |UNION $x
+ |SELECT c1, COUNT(c2) as cnt
+ |FROM t3
+ |GROUP BY c1
+ |$having
+ |$sort
+ |""".stripMargin)
+ .queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+ }
+ }
+ }
+
+ limitAndExpecteds.foreach { case LimitAndExpected(limit, expected) =>
+ assert(sql(
+ s"""
+ |SELECT c1, c2 FROM t1
+ |UNION
+ |SELECT c1, c2 FROM t2
+ |UNION
+ |SELECT c1, c2 FROM t3
+ |LIMIT $limit
+ |""".stripMargin)
+ .queryExecution.optimizedPlan.maxRows.contains(expected))
+ }
+ }
+ }
+
+ test("test watchdog: Select View Statement for forceMaxOutputRows") {
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "3") {
+ withTable("tmp_table", "tmp_union") {
+ withView("tmp_view", "tmp_view2") {
+ sql(s"create table tmp_table (a int, b int)")
+ sql(s"insert into tmp_table values (1,10),(2,20),(3,30),(4,40),(5,50)")
+ sql(s"create table tmp_union (a int, b int)")
+ sql(s"insert into tmp_union values (6,60),(7,70),(8,80),(9,90),(10,100)")
+ sql(s"create view tmp_view2 as select * from tmp_union")
+ assert(!sql(
+ s"""
+ |CREATE VIEW tmp_view
+ |as
+ |SELECT * FROM
+ |tmp_table
+ |""".stripMargin)
+ .queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+
+ assert(sql(
+ s"""
+ |SELECT * FROM
+ |tmp_view
+ |""".stripMargin)
+ .queryExecution.optimizedPlan.maxRows.contains(3))
+
+ assert(sql(
+ s"""
+ |SELECT * FROM
+ |tmp_view
+ |limit 11
+ |""".stripMargin)
+ .queryExecution.optimizedPlan.maxRows.contains(3))
+
+ assert(sql(
+ s"""
+ |SELECT * FROM
+ |(select * from tmp_view
+ |UNION
+ |select * from tmp_view2)
+ |ORDER BY a
+ |DESC
+ |""".stripMargin)
+ .collect().head.get(0) === 10)
+ }
+ }
+ }
+ }
+
+ test("test watchdog: Insert Statement for forceMaxOutputRows") {
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "10") {
+ withTable("tmp_table", "tmp_insert") {
+ spark.sql(s"create table tmp_table (a int, b int)")
+ spark.sql(s"insert into tmp_table values (1,10),(2,20),(3,30),(4,40),(5,50)")
+ val multiInsertTableName1: String = "tmp_tbl1"
+ val multiInsertTableName2: String = "tmp_tbl2"
+ sql(s"drop table if exists $multiInsertTableName1")
+ sql(s"drop table if exists $multiInsertTableName2")
+ sql(s"create table $multiInsertTableName1 like tmp_table")
+ sql(s"create table $multiInsertTableName2 like tmp_table")
+ assert(!sql(
+ s"""
+ |FROM tmp_table
+ |insert into $multiInsertTableName1 select * limit 2
+ |insert into $multiInsertTableName2 select *
+ |""".stripMargin)
+ .queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+ }
+ }
+ }
+
+ test("test watchdog: Distribute by for forceMaxOutputRows") {
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "10") {
+ withTable("tmp_table") {
+ spark.sql(s"create table tmp_table (a int, b int)")
+ spark.sql(s"insert into tmp_table values (1,10),(2,20),(3,30),(4,40),(5,50)")
+ assert(sql(
+ s"""
+ |SELECT *
+ |FROM tmp_table
+ |DISTRIBUTE BY a
+ |""".stripMargin)
+ .queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+ }
+ }
+ }
+
+ test("test watchdog: Subquery for forceMaxOutputRows") {
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "1") {
+ withTable("tmp_table1") {
+ sql("CREATE TABLE spark_catalog.`default`.tmp_table1(KEY INT, VALUE STRING) USING PARQUET")
+ sql("INSERT INTO TABLE spark_catalog.`default`.tmp_table1 " +
+ "VALUES (1, 'aa'),(2,'bb'),(3, 'cc'),(4,'aa'),(5,'cc'),(6, 'aa')")
+ assert(
+ sql("select * from tmp_table1").queryExecution.optimizedPlan.isInstanceOf[GlobalLimit])
+ val testSqlText =
+ """
+ |select count(*)
+ |from tmp_table1
+ |where tmp_table1.key in (
+ |select distinct tmp_table1.key
+ |from tmp_table1
+ |where tmp_table1.value = "aa"
+ |)
+ |""".stripMargin
+ val plan = sql(testSqlText).queryExecution.optimizedPlan
+ assert(!findGlobalLimit(plan))
+ checkAnswer(sql(testSqlText), Row(3) :: Nil)
+ }
+
+ def findGlobalLimit(plan: LogicalPlan): Boolean = plan match {
+ case _: GlobalLimit => true
+ case p if p.children.isEmpty => false
+ case p => p.children.exists(findGlobalLimit)
+ }
+
+ }
+ }
+
+ test("test watchdog: Join for forceMaxOutputRows") {
+ withSQLConf(KyuubiSQLConf.WATCHDOG_FORCED_MAXOUTPUTROWS.key -> "1") {
+ withTable("tmp_table1", "tmp_table2") {
+ sql("CREATE TABLE spark_catalog.`default`.tmp_table1(KEY INT, VALUE STRING) USING PARQUET")
+ sql("INSERT INTO TABLE spark_catalog.`default`.tmp_table1 " +
+ "VALUES (1, 'aa'),(2,'bb'),(3, 'cc'),(4,'aa'),(5,'cc'),(6, 'aa')")
+ sql("CREATE TABLE spark_catalog.`default`.tmp_table2(KEY INT, VALUE STRING) USING PARQUET")
+ sql("INSERT INTO TABLE spark_catalog.`default`.tmp_table2 " +
+ "VALUES (1, 'aa'),(2,'bb'),(3, 'cc'),(4,'aa'),(5,'cc'),(6, 'aa')")
+ val testSqlText =
+ """
+ |select a.*,b.*
+ |from tmp_table1 a
+ |join
+ |tmp_table2 b
+ |on a.KEY = b.KEY
+ |""".stripMargin
+ val plan = sql(testSqlText).queryExecution.optimizedPlan
+ assert(findGlobalLimit(plan))
+ }
+
+ def findGlobalLimit(plan: LogicalPlan): Boolean = plan match {
+ case _: GlobalLimit => true
+ case p if p.children.isEmpty => false
+ case p => p.children.exists(findGlobalLimit)
+ }
+ }
+ }
+
+ private def checkMaxFileSize(tableSize: Long, nonPartTableSize: Long): Unit = {
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE.key -> tableSize.toString) {
+ checkAnswer(sql("SELECT count(distinct(p)) FROM test"), Row(10) :: Nil)
+ }
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE.key -> (tableSize / 2).toString) {
+ sql("SELECT * FROM test where p=1").queryExecution.sparkPlan
+
+ sql(s"SELECT * FROM test WHERE p in (${Range(0, 3).toList.mkString(",")})")
+ .queryExecution.sparkPlan
+
+ intercept[MaxFileSizeExceedException](
+ sql("SELECT * FROM test where p != 1").queryExecution.sparkPlan)
+
+ intercept[MaxFileSizeExceedException](
+ sql("SELECT * FROM test").queryExecution.sparkPlan)
+
+ intercept[MaxFileSizeExceedException](sql(
+ s"SELECT * FROM test WHERE p in (${Range(0, 6).toList.mkString(",")})")
+ .queryExecution.sparkPlan)
+ }
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE.key -> nonPartTableSize.toString) {
+ checkAnswer(sql("SELECT count(*) FROM test_non_part"), Row(10000) :: Nil)
+ }
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE.key -> (nonPartTableSize - 1).toString) {
+ intercept[MaxFileSizeExceedException](
+ sql("SELECT * FROM test_non_part").queryExecution.sparkPlan)
+ }
+ }
+
+ test("watchdog with scan maxFileSize -- hive") {
+ Seq(false).foreach { convertMetastoreParquet =>
+ withTable("test", "test_non_part", "temp") {
+ spark.range(10000).selectExpr("id as col")
+ .createOrReplaceTempView("temp")
+
+ // partitioned table
+ sql(
+ s"""
+ |CREATE TABLE test(i int)
+ |PARTITIONED BY (p int)
+ |STORED AS parquet""".stripMargin)
+ for (part <- Range(0, 10)) {
+ sql(
+ s"""
+ |INSERT OVERWRITE TABLE test PARTITION (p='$part')
+ |select col from temp""".stripMargin)
+ }
+
+ val tablePath = new File(spark.sessionState.catalog.externalCatalog
+ .getTable("default", "test").location)
+ val tableSize = FileUtils.listFiles(tablePath, Array("parquet"), true).asScala
+ .map(_.length()).sum
+ assert(tableSize > 0)
+
+ // non-partitioned table
+ sql(
+ s"""
+ |CREATE TABLE test_non_part(i int)
+ |STORED AS parquet""".stripMargin)
+ sql(
+ s"""
+ |INSERT OVERWRITE TABLE test_non_part
+ |select col from temp""".stripMargin)
+ sql("ANALYZE TABLE test_non_part COMPUTE STATISTICS")
+
+ val nonPartTablePath = new File(spark.sessionState.catalog.externalCatalog
+ .getTable("default", "test_non_part").location)
+ val nonPartTableSize = FileUtils.listFiles(nonPartTablePath, Array("parquet"), true).asScala
+ .map(_.length()).sum
+ assert(nonPartTableSize > 0)
+
+ // check
+ withSQLConf("spark.sql.hive.convertMetastoreParquet" -> convertMetastoreParquet.toString) {
+ checkMaxFileSize(tableSize, nonPartTableSize)
+ }
+ }
+ }
+ }
+
+ test("watchdog with scan maxFileSize -- data source") {
+ withTempDir { dir =>
+ withTempView("test", "test_non_part") {
+ // partitioned table
+ val tablePath = new File(dir, "test")
+ spark.range(10).selectExpr("id", "id as p")
+ .write
+ .partitionBy("p")
+ .mode("overwrite")
+ .parquet(tablePath.getCanonicalPath)
+ spark.read.load(tablePath.getCanonicalPath).createOrReplaceTempView("test")
+
+ val tableSize = FileUtils.listFiles(tablePath, Array("parquet"), true).asScala
+ .map(_.length()).sum
+ assert(tableSize > 0)
+
+ // non-partitioned table
+ val nonPartTablePath = new File(dir, "test_non_part")
+ spark.range(10000).selectExpr("id", "id as p")
+ .write
+ .mode("overwrite")
+ .parquet(nonPartTablePath.getCanonicalPath)
+ spark.read.load(nonPartTablePath.getCanonicalPath).createOrReplaceTempView("test_non_part")
+
+ val nonPartTableSize = FileUtils.listFiles(nonPartTablePath, Array("parquet"), true).asScala
+ .map(_.length()).sum
+ assert(tableSize > 0)
+
+ // check
+ checkMaxFileSize(tableSize, nonPartTableSize)
+ }
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/ZorderCoreBenchmark.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/ZorderCoreBenchmark.scala
new file mode 100644
index 00000000000..9b1614fce31
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/ZorderCoreBenchmark.scala
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkConf
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.benchmark.KyuubiBenchmarkBase
+import org.apache.spark.sql.internal.StaticSQLConf
+
+import org.apache.kyuubi.sql.zorder.ZorderBytesUtils
+
+/**
+ * Benchmark to measure performance with zorder core.
+ *
+ * {{{
+ * RUN_BENCHMARK=1 ./build/mvn clean test \
+ * -pl extensions/spark/kyuubi-extension-spark-3-1 -am \
+ * -Pspark-3.1,kyuubi-extension-spark-3-1 \
+ * -Dtest=none -DwildcardSuites=org.apache.spark.sql.ZorderCoreBenchmark
+ * }}}
+ */
+class ZorderCoreBenchmark extends KyuubiSparkSQLExtensionTest with KyuubiBenchmarkBase {
+ private val runBenchmark = sys.env.contains("RUN_BENCHMARK")
+ private val numRows = 1 * 1000 * 1000
+
+ private def randomInt(numColumns: Int): Seq[Array[Any]] = {
+ (1 to numRows).map { l =>
+ val arr = new Array[Any](numColumns)
+ (0 until numColumns).foreach(col => arr(col) = l)
+ arr
+ }
+ }
+
+ private def randomLong(numColumns: Int): Seq[Array[Any]] = {
+ (1 to numRows).map { l =>
+ val arr = new Array[Any](numColumns)
+ (0 until numColumns).foreach(col => arr(col) = l.toLong)
+ arr
+ }
+ }
+
+ private def interleaveMultiByteArrayBenchmark(): Unit = {
+ val benchmark =
+ new Benchmark(s"$numRows rows zorder core benchmark", numRows, output = output)
+ benchmark.addCase("2 int columns benchmark", 3) { _ =>
+ randomInt(2).foreach(ZorderBytesUtils.interleaveBits)
+ }
+
+ benchmark.addCase("3 int columns benchmark", 3) { _ =>
+ randomInt(3).foreach(ZorderBytesUtils.interleaveBits)
+ }
+
+ benchmark.addCase("4 int columns benchmark", 3) { _ =>
+ randomInt(4).foreach(ZorderBytesUtils.interleaveBits)
+ }
+
+ benchmark.addCase("2 long columns benchmark", 3) { _ =>
+ randomLong(2).foreach(ZorderBytesUtils.interleaveBits)
+ }
+
+ benchmark.addCase("3 long columns benchmark", 3) { _ =>
+ randomLong(3).foreach(ZorderBytesUtils.interleaveBits)
+ }
+
+ benchmark.addCase("4 long columns benchmark", 3) { _ =>
+ randomLong(4).foreach(ZorderBytesUtils.interleaveBits)
+ }
+
+ benchmark.run()
+ }
+
+ private def paddingTo8ByteBenchmark() {
+ val iterations = 10 * 1000 * 1000
+
+ val b2 = Array('a'.toByte, 'b'.toByte)
+ val benchmark =
+ new Benchmark(s"$iterations iterations paddingTo8Byte benchmark", iterations, output = output)
+ benchmark.addCase("2 length benchmark", 3) { _ =>
+ (1 to iterations).foreach(_ => ZorderBytesUtils.paddingTo8Byte(b2))
+ }
+
+ val b16 = Array.tabulate(16) { i => i.toByte }
+ benchmark.addCase("16 length benchmark", 3) { _ =>
+ (1 to iterations).foreach(_ => ZorderBytesUtils.paddingTo8Byte(b16))
+ }
+
+ benchmark.run()
+ }
+
+ test("zorder core benchmark") {
+ assume(runBenchmark)
+
+ withHeader {
+ interleaveMultiByteArrayBenchmark()
+ paddingTo8ByteBenchmark()
+ }
+ }
+
+ override def sparkConf(): SparkConf = {
+ super.sparkConf().remove(StaticSQLConf.SPARK_SESSION_EXTENSIONS.key)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/ZorderSuite.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
new file mode 100644
index 00000000000..c2fa1619707
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/ZorderSuite.scala
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.parser.ParserInterface
+import org.apache.spark.sql.catalyst.plans.logical.{RebalancePartitions, Sort}
+import org.apache.spark.sql.internal.SQLConf
+
+import org.apache.kyuubi.sql.{KyuubiSQLConf, SparkKyuubiSparkSQLParser}
+import org.apache.kyuubi.sql.zorder.Zorder
+
+trait ZorderSuiteSpark extends ZorderSuiteBase {
+
+ test("Add rebalance before zorder") {
+ Seq("true" -> false, "false" -> true).foreach { case (useOriginalOrdering, zorder) =>
+ withSQLConf(
+ KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED.key -> "false",
+ KyuubiSQLConf.REBALANCE_BEFORE_ZORDER.key -> "true",
+ KyuubiSQLConf.REBALANCE_ZORDER_COLUMNS_ENABLED.key -> "true",
+ KyuubiSQLConf.ZORDER_USING_ORIGINAL_ORDERING_ENABLED.key -> useOriginalOrdering) {
+ withTable("t") {
+ sql(
+ """
+ |CREATE TABLE t (c1 int, c2 string) PARTITIONED BY (d string)
+ | TBLPROPERTIES (
+ |'kyuubi.zorder.enabled'= 'true',
+ |'kyuubi.zorder.cols'= 'c1,C2')
+ |""".stripMargin)
+ val p = sql("INSERT INTO TABLE t PARTITION(d='a') SELECT * FROM VALUES(1,'a')")
+ .queryExecution.analyzed
+ assert(p.collect {
+ case sort: Sort
+ if !sort.global &&
+ ((sort.order.exists(_.child.isInstanceOf[Zorder]) && zorder) ||
+ (!sort.order.exists(_.child.isInstanceOf[Zorder]) && !zorder)) => sort
+ }.size == 1)
+ assert(p.collect {
+ case rebalance: RebalancePartitions
+ if rebalance.references.map(_.name).exists(_.equals("c1")) => rebalance
+ }.size == 1)
+
+ val p2 = sql("INSERT INTO TABLE t PARTITION(d) SELECT * FROM VALUES(1,'a','b')")
+ .queryExecution.analyzed
+ assert(p2.collect {
+ case sort: Sort
+ if (!sort.global && Seq("c1", "c2", "d").forall(x =>
+ sort.references.map(_.name).exists(_.equals(x)))) &&
+ ((sort.order.exists(_.child.isInstanceOf[Zorder]) && zorder) ||
+ (!sort.order.exists(_.child.isInstanceOf[Zorder]) && !zorder)) => sort
+ }.size == 1)
+ assert(p2.collect {
+ case rebalance: RebalancePartitions
+ if Seq("c1", "c2", "d").forall(x =>
+ rebalance.references.map(_.name).exists(_.equals(x))) => rebalance
+ }.size == 1)
+ }
+ }
+ }
+ }
+
+ test("Two phase rebalance before Z-Order") {
+ withSQLConf(
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+ "org.apache.spark.sql.catalyst.optimizer.CollapseRepartition",
+ KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED.key -> "false",
+ KyuubiSQLConf.REBALANCE_BEFORE_ZORDER.key -> "true",
+ KyuubiSQLConf.TWO_PHASE_REBALANCE_BEFORE_ZORDER.key -> "true",
+ KyuubiSQLConf.REBALANCE_ZORDER_COLUMNS_ENABLED.key -> "true") {
+ withTable("t") {
+ sql(
+ """
+ |CREATE TABLE t (c1 int) PARTITIONED BY (d string)
+ | TBLPROPERTIES (
+ |'kyuubi.zorder.enabled'= 'true',
+ |'kyuubi.zorder.cols'= 'c1')
+ |""".stripMargin)
+ val p = sql("INSERT INTO TABLE t PARTITION(d) SELECT * FROM VALUES(1,'a')")
+ val rebalance = p.queryExecution.optimizedPlan.innerChildren
+ .flatMap(_.collect { case r: RebalancePartitions => r })
+ assert(rebalance.size == 2)
+ assert(rebalance.head.partitionExpressions.flatMap(_.references.map(_.name))
+ .contains("d"))
+ assert(rebalance.head.partitionExpressions.flatMap(_.references.map(_.name))
+ .contains("c1"))
+
+ assert(rebalance(1).partitionExpressions.flatMap(_.references.map(_.name))
+ .contains("d"))
+ assert(!rebalance(1).partitionExpressions.flatMap(_.references.map(_.name))
+ .contains("c1"))
+ }
+ }
+ }
+}
+
+trait ParserSuite { self: ZorderSuiteBase =>
+ override def createParser: ParserInterface = {
+ new SparkKyuubiSparkSQLParser(spark.sessionState.sqlParser)
+ }
+}
+
+class ZorderWithCodegenEnabledSuite
+ extends ZorderWithCodegenEnabledSuiteBase
+ with ZorderSuiteSpark
+ with ParserSuite {}
+class ZorderWithCodegenDisabledSuite
+ extends ZorderWithCodegenDisabledSuiteBase
+ with ZorderSuiteSpark
+ with ParserSuite {}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/ZorderSuiteBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/ZorderSuiteBase.scala
new file mode 100644
index 00000000000..2d3eec95722
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/ZorderSuiteBase.scala
@@ -0,0 +1,768 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, UnresolvedFunction, UnresolvedRelation, UnresolvedStar}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Ascending, AttributeReference, EqualTo, Expression, ExpressionEvalHelper, Literal, NullsLast, SortOrder}
+import org.apache.spark.sql.catalyst.parser.{ParseException, ParserInterface}
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, OneRowRelation, Project, Sort}
+import org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.hive.execution.InsertIntoHiveTable
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.sql.types._
+
+import org.apache.kyuubi.sql.{KyuubiSQLConf, KyuubiSQLExtensionException}
+import org.apache.kyuubi.sql.zorder.{OptimizeZorderCommandBase, OptimizeZorderStatement, Zorder, ZorderBytesUtils}
+
+trait ZorderSuiteBase extends KyuubiSparkSQLExtensionTest with ExpressionEvalHelper {
+ override def sparkConf(): SparkConf = {
+ super.sparkConf()
+ .set(
+ StaticSQLConf.SPARK_SESSION_EXTENSIONS.key,
+ "org.apache.kyuubi.sql.KyuubiSparkSQLCommonExtension")
+ }
+
+ test("optimize unpartitioned table") {
+ withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "1") {
+ withTable("up") {
+ sql(s"DROP TABLE IF EXISTS up")
+
+ val target = Seq(
+ Seq(0, 0),
+ Seq(1, 0),
+ Seq(0, 1),
+ Seq(1, 1),
+ Seq(2, 0),
+ Seq(3, 0),
+ Seq(2, 1),
+ Seq(3, 1),
+ Seq(0, 2),
+ Seq(1, 2),
+ Seq(0, 3),
+ Seq(1, 3),
+ Seq(2, 2),
+ Seq(3, 2),
+ Seq(2, 3),
+ Seq(3, 3))
+ sql(s"CREATE TABLE up (c1 INT, c2 INT, c3 INT)")
+ sql(s"INSERT INTO TABLE up VALUES" +
+ "(0,0,2),(0,1,2),(0,2,1),(0,3,3)," +
+ "(1,0,4),(1,1,2),(1,2,1),(1,3,3)," +
+ "(2,0,2),(2,1,1),(2,2,5),(2,3,5)," +
+ "(3,0,3),(3,1,4),(3,2,9),(3,3,0)")
+
+ val e = intercept[KyuubiSQLExtensionException] {
+ sql("OPTIMIZE up WHERE c1 > 1 ZORDER BY c1, c2")
+ }
+ assert(e.getMessage == "Filters are only supported for partitioned table")
+
+ sql("OPTIMIZE up ZORDER BY c1, c2")
+ val res = sql("SELECT c1, c2 FROM up").collect()
+
+ assert(res.length == 16)
+
+ for (i <- target.indices) {
+ val t = target(i)
+ val r = res(i)
+ assert(t(0) == r.getInt(0))
+ assert(t(1) == r.getInt(1))
+ }
+ }
+ }
+ }
+
+ test("optimize partitioned table") {
+ withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "1") {
+ withTable("p") {
+ sql("DROP TABLE IF EXISTS p")
+
+ val target = Seq(
+ Seq(0, 0),
+ Seq(1, 0),
+ Seq(0, 1),
+ Seq(1, 1),
+ Seq(2, 0),
+ Seq(3, 0),
+ Seq(2, 1),
+ Seq(3, 1),
+ Seq(0, 2),
+ Seq(1, 2),
+ Seq(0, 3),
+ Seq(1, 3),
+ Seq(2, 2),
+ Seq(3, 2),
+ Seq(2, 3),
+ Seq(3, 3))
+
+ sql(s"CREATE TABLE p (c1 INT, c2 INT, c3 INT) PARTITIONED BY (id INT)")
+ sql(s"ALTER TABLE p ADD PARTITION (id = 1)")
+ sql(s"ALTER TABLE p ADD PARTITION (id = 2)")
+ sql(s"INSERT INTO TABLE p PARTITION (id = 1) VALUES" +
+ "(0,0,2),(0,1,2),(0,2,1),(0,3,3)," +
+ "(1,0,4),(1,1,2),(1,2,1),(1,3,3)," +
+ "(2,0,2),(2,1,1),(2,2,5),(2,3,5)," +
+ "(3,0,3),(3,1,4),(3,2,9),(3,3,0)")
+ sql(s"INSERT INTO TABLE p PARTITION (id = 2) VALUES" +
+ "(0,0,2),(0,1,2),(0,2,1),(0,3,3)," +
+ "(1,0,4),(1,1,2),(1,2,1),(1,3,3)," +
+ "(2,0,2),(2,1,1),(2,2,5),(2,3,5)," +
+ "(3,0,3),(3,1,4),(3,2,9),(3,3,0)")
+
+ sql(s"OPTIMIZE p ZORDER BY c1, c2")
+
+ val res1 = sql(s"SELECT c1, c2 FROM p WHERE id = 1").collect()
+ val res2 = sql(s"SELECT c1, c2 FROM p WHERE id = 2").collect()
+
+ assert(res1.length == 16)
+ assert(res2.length == 16)
+
+ for (i <- target.indices) {
+ val t = target(i)
+ val r1 = res1(i)
+ assert(t(0) == r1.getInt(0))
+ assert(t(1) == r1.getInt(1))
+
+ val r2 = res2(i)
+ assert(t(0) == r2.getInt(0))
+ assert(t(1) == r2.getInt(1))
+ }
+ }
+ }
+ }
+
+ test("optimize partitioned table with filters") {
+ withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "1") {
+ withTable("p") {
+ sql("DROP TABLE IF EXISTS p")
+
+ val target1 = Seq(
+ Seq(0, 0),
+ Seq(1, 0),
+ Seq(0, 1),
+ Seq(1, 1),
+ Seq(2, 0),
+ Seq(3, 0),
+ Seq(2, 1),
+ Seq(3, 1),
+ Seq(0, 2),
+ Seq(1, 2),
+ Seq(0, 3),
+ Seq(1, 3),
+ Seq(2, 2),
+ Seq(3, 2),
+ Seq(2, 3),
+ Seq(3, 3))
+ val target2 = Seq(
+ Seq(0, 0),
+ Seq(0, 1),
+ Seq(0, 2),
+ Seq(0, 3),
+ Seq(1, 0),
+ Seq(1, 1),
+ Seq(1, 2),
+ Seq(1, 3),
+ Seq(2, 0),
+ Seq(2, 1),
+ Seq(2, 2),
+ Seq(2, 3),
+ Seq(3, 0),
+ Seq(3, 1),
+ Seq(3, 2),
+ Seq(3, 3))
+ sql(s"CREATE TABLE p (c1 INT, c2 INT, c3 INT) PARTITIONED BY (id INT)")
+ sql(s"ALTER TABLE p ADD PARTITION (id = 1)")
+ sql(s"ALTER TABLE p ADD PARTITION (id = 2)")
+ sql(s"INSERT INTO TABLE p PARTITION (id = 1) VALUES" +
+ "(0,0,2),(0,1,2),(0,2,1),(0,3,3)," +
+ "(1,0,4),(1,1,2),(1,2,1),(1,3,3)," +
+ "(2,0,2),(2,1,1),(2,2,5),(2,3,5)," +
+ "(3,0,3),(3,1,4),(3,2,9),(3,3,0)")
+ sql(s"INSERT INTO TABLE p PARTITION (id = 2) VALUES" +
+ "(0,0,2),(0,1,2),(0,2,1),(0,3,3)," +
+ "(1,0,4),(1,1,2),(1,2,1),(1,3,3)," +
+ "(2,0,2),(2,1,1),(2,2,5),(2,3,5)," +
+ "(3,0,3),(3,1,4),(3,2,9),(3,3,0)")
+
+ val e = intercept[KyuubiSQLExtensionException](
+ sql(s"OPTIMIZE p WHERE id = 1 AND c1 > 1 ZORDER BY c1, c2"))
+ assert(e.getMessage == "Only partition column filters are allowed")
+
+ sql(s"OPTIMIZE p WHERE id = 1 ZORDER BY c1, c2")
+
+ val res1 = sql(s"SELECT c1, c2 FROM p WHERE id = 1").collect()
+ val res2 = sql(s"SELECT c1, c2 FROM p WHERE id = 2").collect()
+
+ assert(res1.length == 16)
+ assert(res2.length == 16)
+
+ for (i <- target1.indices) {
+ val t1 = target1(i)
+ val r1 = res1(i)
+ assert(t1(0) == r1.getInt(0))
+ assert(t1(1) == r1.getInt(1))
+
+ val t2 = target2(i)
+ val r2 = res2(i)
+ assert(t2(0) == r2.getInt(0))
+ assert(t2(1) == r2.getInt(1))
+ }
+ }
+ }
+ }
+
+ test("optimize zorder with datasource table") {
+ // TODO remove this if we support datasource table
+ withTable("t") {
+ sql("CREATE TABLE t (c1 int, c2 int) USING PARQUET")
+ val msg = intercept[KyuubiSQLExtensionException] {
+ sql("OPTIMIZE t ZORDER BY c1, c2")
+ }.getMessage
+ assert(msg.contains("only support hive table"))
+ }
+ }
+
+ private def checkZorderTable(
+ enabled: Boolean,
+ cols: String,
+ planHasRepartition: Boolean,
+ resHasSort: Boolean): Unit = {
+ def checkSort(plan: LogicalPlan): Unit = {
+ assert(plan.isInstanceOf[Sort] === resHasSort)
+ plan match {
+ case sort: Sort =>
+ val colArr = cols.split(",")
+ val refs =
+ if (colArr.length == 1) {
+ sort.order.head
+ .child.asInstanceOf[AttributeReference] :: Nil
+ } else {
+ sort.order.head
+ .child.asInstanceOf[Zorder].children.map(_.references.head)
+ }
+ assert(refs.size === colArr.size)
+ refs.zip(colArr).foreach { case (ref, col) =>
+ assert(ref.name === col.trim)
+ }
+ case _ =>
+ }
+ }
+
+ val repartition =
+ if (planHasRepartition) {
+ "/*+ repartition */"
+ } else {
+ ""
+ }
+ withSQLConf("spark.sql.shuffle.partitions" -> "1") {
+ // hive
+ withSQLConf("spark.sql.hive.convertMetastoreParquet" -> "false") {
+ withTable("zorder_t1", "zorder_t2_true", "zorder_t2_false") {
+ sql(
+ s"""
+ |CREATE TABLE zorder_t1 (c1 int, c2 string, c3 long, c4 double) STORED AS PARQUET
+ |TBLPROPERTIES (
+ | 'kyuubi.zorder.enabled' = '$enabled',
+ | 'kyuubi.zorder.cols' = '$cols')
+ |""".stripMargin)
+ val df1 = sql(s"""
+ |INSERT INTO TABLE zorder_t1
+ |SELECT $repartition * FROM VALUES(1,'a',2,4D),(2,'b',3,6D)
+ |""".stripMargin)
+ assert(df1.queryExecution.analyzed.isInstanceOf[InsertIntoHiveTable])
+ checkSort(df1.queryExecution.analyzed.children.head)
+
+ Seq("true", "false").foreach { optimized =>
+ withSQLConf(
+ "spark.sql.hive.convertMetastoreCtas" -> optimized,
+ "spark.sql.hive.convertMetastoreParquet" -> optimized) {
+
+ withListener(
+ s"""
+ |CREATE TABLE zorder_t2_$optimized STORED AS PARQUET
+ |TBLPROPERTIES (
+ | 'kyuubi.zorder.enabled' = '$enabled',
+ | 'kyuubi.zorder.cols' = '$cols')
+ |
+ |SELECT $repartition * FROM
+ |VALUES(1,'a',2,4D),(2,'b',3,6D) AS t(c1 ,c2 , c3, c4)
+ |""".stripMargin) { write =>
+ if (optimized.toBoolean) {
+ assert(write.isInstanceOf[InsertIntoHadoopFsRelationCommand])
+ } else {
+ assert(write.isInstanceOf[InsertIntoHiveTable])
+ }
+ checkSort(write.query)
+ }
+ }
+ }
+ }
+ }
+
+ // datasource
+ withTable("zorder_t3", "zorder_t4") {
+ sql(
+ s"""
+ |CREATE TABLE zorder_t3 (c1 int, c2 string, c3 long, c4 double) USING PARQUET
+ |TBLPROPERTIES (
+ | 'kyuubi.zorder.enabled' = '$enabled',
+ | 'kyuubi.zorder.cols' = '$cols')
+ |""".stripMargin)
+ val df1 = sql(s"""
+ |INSERT INTO TABLE zorder_t3
+ |SELECT $repartition * FROM VALUES(1,'a',2,4D),(2,'b',3,6D)
+ |""".stripMargin)
+ assert(df1.queryExecution.analyzed.isInstanceOf[InsertIntoHadoopFsRelationCommand])
+ checkSort(df1.queryExecution.analyzed.children.head)
+
+ withListener(
+ s"""
+ |CREATE TABLE zorder_t4 USING PARQUET
+ |TBLPROPERTIES (
+ | 'kyuubi.zorder.enabled' = '$enabled',
+ | 'kyuubi.zorder.cols' = '$cols')
+ |
+ |SELECT $repartition * FROM
+ |VALUES(1,'a',2,4D),(2,'b',3,6D) AS t(c1 ,c2 , c3, c4)
+ |""".stripMargin) { write =>
+ assert(write.isInstanceOf[InsertIntoHadoopFsRelationCommand])
+ checkSort(write.query)
+ }
+ }
+ }
+ }
+
+ test("Support insert zorder by table properties") {
+ withSQLConf(KyuubiSQLConf.INSERT_ZORDER_BEFORE_WRITING.key -> "false") {
+ checkZorderTable(true, "c1", false, false)
+ checkZorderTable(false, "c1", false, false)
+ }
+ withSQLConf(KyuubiSQLConf.INSERT_ZORDER_BEFORE_WRITING.key -> "true") {
+ checkZorderTable(true, "", false, false)
+ checkZorderTable(true, "c5", false, false)
+ checkZorderTable(true, "c1,c5", false, false)
+ checkZorderTable(false, "c3", false, false)
+ checkZorderTable(true, "c3", true, false)
+ checkZorderTable(true, "c3", false, true)
+ checkZorderTable(true, "c2,c4", false, true)
+ checkZorderTable(true, "c4, c2, c1, c3", false, true)
+ }
+ }
+
+ test("zorder: check unsupported data type") {
+ def checkZorderPlan(zorder: Expression): Unit = {
+ val msg = intercept[AnalysisException] {
+ val plan = Project(Seq(Alias(zorder, "c")()), OneRowRelation())
+ spark.sessionState.analyzer.checkAnalysis(plan)
+ }.getMessage
+ // before Spark 3.2.0 the null type catalog string is null, after Spark 3.2.0 it's void
+ // see https://github.com/apache/spark/pull/33437
+ assert(msg.contains("Unsupported z-order type:") &&
+ (msg.contains("null") || msg.contains("void")))
+ }
+
+ checkZorderPlan(Zorder(Seq(Literal(null, NullType))))
+ checkZorderPlan(Zorder(Seq(Literal(1, IntegerType), Literal(null, NullType))))
+ }
+
+ test("zorder: check supported data type") {
+ val children = Seq(
+ Literal.create(false, BooleanType),
+ Literal.create(null, BooleanType),
+ Literal.create(1.toByte, ByteType),
+ Literal.create(null, ByteType),
+ Literal.create(1.toShort, ShortType),
+ Literal.create(null, ShortType),
+ Literal.create(1, IntegerType),
+ Literal.create(null, IntegerType),
+ Literal.create(1L, LongType),
+ Literal.create(null, LongType),
+ Literal.create(1f, FloatType),
+ Literal.create(null, FloatType),
+ Literal.create(1d, DoubleType),
+ Literal.create(null, DoubleType),
+ Literal.create("1", StringType),
+ Literal.create(null, StringType),
+ Literal.create(1L, TimestampType),
+ Literal.create(null, TimestampType),
+ Literal.create(1, DateType),
+ Literal.create(null, DateType),
+ Literal.create(BigDecimal(1, 1), DecimalType(1, 1)),
+ Literal.create(null, DecimalType(1, 1)))
+ val zorder = Zorder(children)
+ val plan = Project(Seq(Alias(zorder, "c")()), OneRowRelation())
+ spark.sessionState.analyzer.checkAnalysis(plan)
+ assert(zorder.foldable)
+
+// // scalastyle:off
+// val resultGen = org.apache.commons.codec.binary.Hex.encodeHex(
+// zorder.eval(InternalRow.fromSeq(children)).asInstanceOf[Array[Byte]], false)
+// resultGen.grouped(2).zipWithIndex.foreach { case (char, i) =>
+// print("0x" + char(0) + char(1) + ", ")
+// if ((i + 1) % 10 == 0) {
+// println()
+// }
+// }
+// // scalastyle:on
+
+ val expected = Array(
+ 0xFB, 0xEA, 0xAA, 0xBA, 0xAE, 0xAB, 0xAA, 0xEA, 0xBA, 0xAE, 0xAB, 0xAA, 0xEA, 0xBA, 0xA6,
+ 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA,
+ 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xBA, 0xBB, 0xAA, 0xAA, 0xAA,
+ 0xBA, 0xAA, 0xBA, 0xAA, 0xBA, 0xAA, 0xBA, 0xAA, 0xBA, 0xAA, 0xBA, 0xAA, 0x9A, 0xAA, 0xAA,
+ 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xEA,
+ 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA,
+ 0xAA, 0xAA, 0xBE, 0xAA, 0xAA, 0x8A, 0xBA, 0xAA, 0x2A, 0xEA, 0xA8, 0xAA, 0xAA, 0xA2, 0xAA,
+ 0xAA, 0x8A, 0xAA, 0xAA, 0x2F, 0xEB, 0xFE)
+ .map(_.toByte)
+ checkEvaluation(zorder, expected, InternalRow.fromSeq(children))
+ }
+
+ private def checkSort(input: DataFrame, expected: Seq[Row], dataType: Array[DataType]): Unit = {
+ withTempDir { dir =>
+ input.repartition(3).write.mode("overwrite").format("parquet").save(dir.getCanonicalPath)
+ val df = spark.read.format("parquet")
+ .load(dir.getCanonicalPath)
+ .repartition(1)
+ assert(df.schema.fields.map(_.dataType).sameElements(dataType))
+ val exprs = Seq("c1", "c2").map(col).map(_.expr)
+ val sortOrder = SortOrder(Zorder(exprs), Ascending, NullsLast, Seq.empty)
+ val zorderSort = Sort(Seq(sortOrder), true, df.logicalPlan)
+ val result = Dataset.ofRows(spark, zorderSort)
+ checkAnswer(result, expected)
+ }
+ }
+
+ test("sort with zorder -- boolean column") {
+ val schema = StructType(StructField("c1", BooleanType) :: StructField("c2", BooleanType) :: Nil)
+ val nonNullDF = spark.createDataFrame(
+ spark.sparkContext.parallelize(
+ Seq(Row(false, false), Row(false, true), Row(true, false), Row(true, true))),
+ schema)
+ val expected =
+ Row(false, false) :: Row(true, false) :: Row(false, true) :: Row(true, true) :: Nil
+ checkSort(nonNullDF, expected, Array(BooleanType, BooleanType))
+ val df = spark.createDataFrame(
+ spark.sparkContext.parallelize(
+ Seq(Row(false, false), Row(false, null), Row(null, false), Row(null, null))),
+ schema)
+ val expected2 =
+ Row(false, false) :: Row(null, false) :: Row(false, null) :: Row(null, null) :: Nil
+ checkSort(df, expected2, Array(BooleanType, BooleanType))
+ }
+
+ test("sort with zorder -- int column") {
+ // TODO: add more datatype unit test
+ val session = spark
+ import session.implicits._
+ // generate 4 * 4 matrix
+ val len = 3
+ val input = spark.range(len + 1).selectExpr("cast(id as int) as c1")
+ .select($"c1", explode(sequence(lit(0), lit(len))) as "c2")
+ val expected =
+ Row(0, 0) :: Row(1, 0) :: Row(0, 1) :: Row(1, 1) ::
+ Row(2, 0) :: Row(3, 0) :: Row(2, 1) :: Row(3, 1) ::
+ Row(0, 2) :: Row(1, 2) :: Row(0, 3) :: Row(1, 3) ::
+ Row(2, 2) :: Row(3, 2) :: Row(2, 3) :: Row(3, 3) :: Nil
+ checkSort(input, expected, Array(IntegerType, IntegerType))
+
+ // contains null value case.
+ val nullDF = spark.range(1).selectExpr("cast(null as int) as c1")
+ val input2 = spark.range(len).selectExpr("cast(id as int) as c1")
+ .union(nullDF)
+ .select(
+ $"c1",
+ explode(concat(sequence(lit(0), lit(len - 1)), array(lit(null)))) as "c2")
+ val expected2 = Row(0, 0) :: Row(1, 0) :: Row(0, 1) :: Row(1, 1) ::
+ Row(2, 0) :: Row(2, 1) :: Row(0, 2) :: Row(1, 2) ::
+ Row(2, 2) :: Row(null, 0) :: Row(null, 1) :: Row(null, 2) ::
+ Row(0, null) :: Row(1, null) :: Row(2, null) :: Row(null, null) :: Nil
+ checkSort(input2, expected2, Array(IntegerType, IntegerType))
+ }
+
+ test("sort with zorder -- string column") {
+ val schema = StructType(StructField("c1", StringType) :: StructField("c2", StringType) :: Nil)
+ val rdd = spark.sparkContext.parallelize(Seq(
+ Row("a", "a"),
+ Row("a", "b"),
+ Row("a", "c"),
+ Row("a", "d"),
+ Row("b", "a"),
+ Row("b", "b"),
+ Row("b", "c"),
+ Row("b", "d"),
+ Row("c", "a"),
+ Row("c", "b"),
+ Row("c", "c"),
+ Row("c", "d"),
+ Row("d", "a"),
+ Row("d", "b"),
+ Row("d", "c"),
+ Row("d", "d")))
+ val input = spark.createDataFrame(rdd, schema)
+ val expected = Row("a", "a") :: Row("b", "a") :: Row("c", "a") :: Row("a", "b") ::
+ Row("a", "c") :: Row("b", "b") :: Row("c", "b") :: Row("b", "c") ::
+ Row("c", "c") :: Row("d", "a") :: Row("d", "b") :: Row("d", "c") ::
+ Row("a", "d") :: Row("b", "d") :: Row("c", "d") :: Row("d", "d") :: Nil
+ checkSort(input, expected, Array(StringType, StringType))
+
+ val rdd2 = spark.sparkContext.parallelize(Seq(
+ Row(null, "a"),
+ Row("a", "b"),
+ Row("a", "c"),
+ Row("a", null),
+ Row("b", "a"),
+ Row(null, "b"),
+ Row("b", null),
+ Row("b", "d"),
+ Row("c", "a"),
+ Row("c", null),
+ Row(null, "c"),
+ Row("c", "d"),
+ Row("d", null),
+ Row("d", "b"),
+ Row("d", "c"),
+ Row(null, "d"),
+ Row(null, null)))
+ val input2 = spark.createDataFrame(rdd2, schema)
+ val expected2 = Row("b", "a") :: Row("c", "a") :: Row("a", "b") :: Row("a", "c") ::
+ Row("d", "b") :: Row("d", "c") :: Row("b", "d") :: Row("c", "d") ::
+ Row(null, "a") :: Row(null, "b") :: Row(null, "c") :: Row(null, "d") ::
+ Row("a", null) :: Row("b", null) :: Row("c", null) :: Row("d", null) ::
+ Row(null, null) :: Nil
+ checkSort(input2, expected2, Array(StringType, StringType))
+ }
+
+ test("test special value of short int long type") {
+ val df1 = spark.createDataFrame(Seq(
+ (-1, -1L),
+ (Int.MinValue, Int.MinValue.toLong),
+ (1, 1L),
+ (Int.MaxValue - 1, Int.MaxValue.toLong),
+ (Int.MaxValue - 1, Int.MaxValue.toLong - 1),
+ (Int.MaxValue, Int.MaxValue.toLong + 1),
+ (Int.MaxValue, Int.MaxValue.toLong))).toDF("c1", "c2")
+ val expected1 =
+ Row(Int.MinValue, Int.MinValue.toLong) ::
+ Row(-1, -1L) ::
+ Row(1, 1L) ::
+ Row(Int.MaxValue - 1, Int.MaxValue.toLong - 1) ::
+ Row(Int.MaxValue - 1, Int.MaxValue.toLong) ::
+ Row(Int.MaxValue, Int.MaxValue.toLong) ::
+ Row(Int.MaxValue, Int.MaxValue.toLong + 1) :: Nil
+ checkSort(df1, expected1, Array(IntegerType, LongType))
+
+ val df2 = spark.createDataFrame(Seq(
+ (-1, -1.toShort),
+ (Short.MinValue.toInt, Short.MinValue),
+ (1, 1.toShort),
+ (Short.MaxValue.toInt, (Short.MaxValue - 1).toShort),
+ (Short.MaxValue.toInt + 1, (Short.MaxValue - 1).toShort),
+ (Short.MaxValue.toInt, Short.MaxValue),
+ (Short.MaxValue.toInt + 1, Short.MaxValue))).toDF("c1", "c2")
+ val expected2 =
+ Row(Short.MinValue.toInt, Short.MinValue) ::
+ Row(-1, -1.toShort) ::
+ Row(1, 1.toShort) ::
+ Row(Short.MaxValue.toInt, Short.MaxValue - 1) ::
+ Row(Short.MaxValue.toInt, Short.MaxValue) ::
+ Row(Short.MaxValue.toInt + 1, Short.MaxValue - 1) ::
+ Row(Short.MaxValue.toInt + 1, Short.MaxValue) :: Nil
+ checkSort(df2, expected2, Array(IntegerType, ShortType))
+
+ val df3 = spark.createDataFrame(Seq(
+ (-1L, -1.toShort),
+ (Short.MinValue.toLong, Short.MinValue),
+ (1L, 1.toShort),
+ (Short.MaxValue.toLong, (Short.MaxValue - 1).toShort),
+ (Short.MaxValue.toLong + 1, (Short.MaxValue - 1).toShort),
+ (Short.MaxValue.toLong, Short.MaxValue),
+ (Short.MaxValue.toLong + 1, Short.MaxValue))).toDF("c1", "c2")
+ val expected3 =
+ Row(Short.MinValue.toLong, Short.MinValue) ::
+ Row(-1L, -1.toShort) ::
+ Row(1L, 1.toShort) ::
+ Row(Short.MaxValue.toLong, Short.MaxValue - 1) ::
+ Row(Short.MaxValue.toLong, Short.MaxValue) ::
+ Row(Short.MaxValue.toLong + 1, Short.MaxValue - 1) ::
+ Row(Short.MaxValue.toLong + 1, Short.MaxValue) :: Nil
+ checkSort(df3, expected3, Array(LongType, ShortType))
+ }
+
+ test("skip zorder if only requires one column") {
+ withTable("t") {
+ withSQLConf("spark.sql.hive.convertMetastoreParquet" -> "false") {
+ sql("CREATE TABLE t (c1 int, c2 string) stored as parquet")
+ val order1 = sql("OPTIMIZE t ZORDER BY c1").queryExecution.analyzed
+ .asInstanceOf[OptimizeZorderCommandBase].query.asInstanceOf[Sort].order.head.child
+ assert(!order1.isInstanceOf[Zorder])
+ assert(order1.isInstanceOf[AttributeReference])
+ }
+ }
+ }
+
+ test("Add config to control if zorder using global sort") {
+ withTable("t") {
+ withSQLConf(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED.key -> "false") {
+ sql(
+ """
+ |CREATE TABLE t (c1 int, c2 string) TBLPROPERTIES (
+ |'kyuubi.zorder.enabled'= 'true',
+ |'kyuubi.zorder.cols'= 'c1,c2')
+ |""".stripMargin)
+ val p1 = sql("OPTIMIZE t ZORDER BY c1, c2").queryExecution.analyzed
+ assert(p1.collect {
+ case shuffle: Sort if !shuffle.global => shuffle
+ }.size == 1)
+
+ val p2 = sql("INSERT INTO TABLE t SELECT * FROM VALUES(1,'a')").queryExecution.analyzed
+ assert(p2.collect {
+ case shuffle: Sort if !shuffle.global => shuffle
+ }.size == 1)
+ }
+ }
+ }
+
+ test("fast approach test") {
+ Seq[Seq[Any]](
+ Seq(1L, 2L),
+ Seq(1L, 2L, 3L),
+ Seq(1L, 2L, 3L, 4L),
+ Seq(1L, 2L, 3L, 4L, 5L),
+ Seq(1L, 2L, 3L, 4L, 5L, 6L),
+ Seq(1L, 2L, 3L, 4L, 5L, 6L, 7L),
+ Seq(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L))
+ .foreach { inputs =>
+ assert(java.util.Arrays.equals(
+ ZorderBytesUtils.interleaveBits(inputs.toArray),
+ ZorderBytesUtils.interleaveBitsDefault(inputs.map(ZorderBytesUtils.toByteArray).toArray)))
+ }
+ }
+
+ test("OPTIMIZE command is parsed as expected") {
+ val parser = createParser
+ val globalSort = spark.conf.get(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED)
+
+ assert(parser.parsePlan("OPTIMIZE p zorder by c1") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(UnresolvedAttribute("c1"), Ascending, NullsLast, Seq.empty) :: Nil,
+ globalSort,
+ Project(Seq(UnresolvedStar(None)), UnresolvedRelation(TableIdentifier("p"))))))
+
+ assert(parser.parsePlan("OPTIMIZE p zorder by c1, c2") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(
+ Zorder(Seq(UnresolvedAttribute("c1"), UnresolvedAttribute("c2"))),
+ Ascending,
+ NullsLast,
+ Seq.empty) :: Nil,
+ globalSort,
+ Project(Seq(UnresolvedStar(None)), UnresolvedRelation(TableIdentifier("p"))))))
+
+ assert(parser.parsePlan("OPTIMIZE p where id = 1 zorder by c1") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(UnresolvedAttribute("c1"), Ascending, NullsLast, Seq.empty) :: Nil,
+ globalSort,
+ Project(
+ Seq(UnresolvedStar(None)),
+ Filter(
+ EqualTo(UnresolvedAttribute("id"), Literal(1)),
+ UnresolvedRelation(TableIdentifier("p")))))))
+
+ assert(parser.parsePlan("OPTIMIZE p where id = 1 zorder by c1, c2") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(
+ Zorder(Seq(UnresolvedAttribute("c1"), UnresolvedAttribute("c2"))),
+ Ascending,
+ NullsLast,
+ Seq.empty) :: Nil,
+ globalSort,
+ Project(
+ Seq(UnresolvedStar(None)),
+ Filter(
+ EqualTo(UnresolvedAttribute("id"), Literal(1)),
+ UnresolvedRelation(TableIdentifier("p")))))))
+
+ assert(parser.parsePlan("OPTIMIZE p where id = current_date() zorder by c1") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(UnresolvedAttribute("c1"), Ascending, NullsLast, Seq.empty) :: Nil,
+ globalSort,
+ Project(
+ Seq(UnresolvedStar(None)),
+ Filter(
+ EqualTo(
+ UnresolvedAttribute("id"),
+ UnresolvedFunction("current_date", Seq.empty, false)),
+ UnresolvedRelation(TableIdentifier("p")))))))
+
+ // TODO: add following case support
+ intercept[ParseException] {
+ parser.parsePlan("OPTIMIZE p zorder by (c1)")
+ }
+
+ intercept[ParseException] {
+ parser.parsePlan("OPTIMIZE p zorder by (c1, c2)")
+ }
+ }
+
+ test("OPTIMIZE partition predicates constraint") {
+ withTable("p") {
+ sql("CREATE TABLE p (c1 INT, c2 INT) PARTITIONED BY (event_date DATE)")
+ val e1 = intercept[KyuubiSQLExtensionException] {
+ sql("OPTIMIZE p WHERE event_date = current_date as c ZORDER BY c1, c2")
+ }
+ assert(e1.getMessage.contains("unsupported partition predicates"))
+
+ val e2 = intercept[KyuubiSQLExtensionException] {
+ sql("OPTIMIZE p WHERE c1 = 1 ZORDER BY c1, c2")
+ }
+ assert(e2.getMessage == "Only partition column filters are allowed")
+ }
+ }
+
+ def createParser: ParserInterface
+}
+
+trait ZorderWithCodegenEnabledSuiteBase extends ZorderSuiteBase {
+ override def sparkConf(): SparkConf = {
+ val conf = super.sparkConf
+ conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+ conf
+ }
+}
+
+trait ZorderWithCodegenDisabledSuiteBase extends ZorderSuiteBase {
+ override def sparkConf(): SparkConf = {
+ val conf = super.sparkConf
+ conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")
+ conf.set(SQLConf.CODEGEN_FACTORY_MODE.key, "NO_CODEGEN")
+ conf
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/benchmark/KyuubiBenchmarkBase.scala b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/benchmark/KyuubiBenchmarkBase.scala
new file mode 100644
index 00000000000..b891a7224a0
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/test/scala/org/apache/spark/sql/benchmark/KyuubiBenchmarkBase.scala
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.benchmark
+
+import java.io.{File, FileOutputStream, OutputStream}
+
+import scala.collection.JavaConverters._
+
+import com.google.common.reflect.ClassPath
+import org.scalatest.Assertions._
+
+trait KyuubiBenchmarkBase {
+ var output: Option[OutputStream] = None
+
+ private val prefix = {
+ val benchmarkClasses = ClassPath.from(Thread.currentThread.getContextClassLoader)
+ .getTopLevelClassesRecursive("org.apache.spark.sql").asScala.toArray
+ assert(benchmarkClasses.nonEmpty)
+ val benchmark = benchmarkClasses.find(_.load().getName.endsWith("Benchmark"))
+ val targetDirOrProjDir =
+ new File(benchmark.get.load().getProtectionDomain.getCodeSource.getLocation.toURI)
+ .getParentFile.getParentFile
+ if (targetDirOrProjDir.getName == "target") {
+ targetDirOrProjDir.getParentFile.getCanonicalPath + "/"
+ } else {
+ targetDirOrProjDir.getCanonicalPath + "/"
+ }
+ }
+
+ def withHeader(func: => Unit): Unit = {
+ val version = System.getProperty("java.version").split("\\D+")(0).toInt
+ val jdkString = if (version > 8) s"-jdk$version" else ""
+ val resultFileName =
+ s"${this.getClass.getSimpleName.replace("$", "")}$jdkString-results.txt"
+ val dir = new File(s"${prefix}benchmarks/")
+ if (!dir.exists()) {
+ // scalastyle:off println
+ println(s"Creating ${dir.getAbsolutePath} for benchmark results.")
+ // scalastyle:on println
+ dir.mkdirs()
+ }
+ val file = new File(dir, resultFileName)
+ if (!file.exists()) {
+ file.createNewFile()
+ }
+ output = Some(new FileOutputStream(file))
+
+ func
+
+ output.foreach { o =>
+ if (o != null) {
+ o.close()
+ }
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-common/pom.xml b/extensions/spark/kyuubi-extension-spark-common/pom.xml
index 6d4bd144369..259931a2e2f 100644
--- a/extensions/spark/kyuubi-extension-spark-common/pom.xml
+++ b/extensions/spark/kyuubi-extension-spark-common/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-extension-spark-common_2.12
+ kyuubi-extension-spark-common_${scala.binary.version}jarKyuubi Dev Spark Extensions Common (for Spark 3)https://kyuubi.apache.org/
@@ -110,10 +110,21 @@
jakarta.xml.bind-apitest
+
+
+ org.apache.logging.log4j
+ log4j-1.2-api
+ test
+
+
+
+ org.apache.logging.log4j
+ log4j-slf4j-impl
+ test
+
-
org.antlr
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/main/antlr4/org/apache/kyuubi/sql/KyuubiSparkSQL.g4 b/extensions/spark/kyuubi-extension-spark-common/src/main/antlr4/org/apache/kyuubi/sql/KyuubiSparkSQL.g4
index 63e2bf84813..e52b7f5cfeb 100644
--- a/extensions/spark/kyuubi-extension-spark-common/src/main/antlr4/org/apache/kyuubi/sql/KyuubiSparkSQL.g4
+++ b/extensions/spark/kyuubi-extension-spark-common/src/main/antlr4/org/apache/kyuubi/sql/KyuubiSparkSQL.g4
@@ -55,53 +55,23 @@ statement
;
whereClause
- : WHERE booleanExpression
+ : WHERE partitionPredicate = predicateToken
;
zorderClause
: ZORDER BY order+=multipartIdentifier (',' order+=multipartIdentifier)*
;
-booleanExpression
- : query #logicalQuery
- | left=booleanExpression operator=AND right=booleanExpression #logicalBinary
- | left=booleanExpression operator=OR right=booleanExpression #logicalBinary
- ;
-
-query
- : '('? multipartIdentifier comparisonOperator constant ')'?
- ;
-
-comparisonOperator
- : EQ | NEQ | NEQJ | LT | LTE | GT | GTE | NSEQ
- ;
-
-constant
- : NULL #nullLiteral
- | identifier STRING #typeConstructor
- | number #numericLiteral
- | booleanValue #booleanLiteral
- | STRING+ #stringLiteral
+// We don't have an expression rule in our grammar here, so we just grab the tokens and defer
+// parsing them to later.
+predicateToken
+ : .+?
;
multipartIdentifier
: parts+=identifier ('.' parts+=identifier)*
;
-booleanValue
- : TRUE | FALSE
- ;
-
-number
- : MINUS? DECIMAL_VALUE #decimalLiteral
- | MINUS? INTEGER_VALUE #integerLiteral
- | MINUS? BIGINT_LITERAL #bigIntLiteral
- | MINUS? SMALLINT_LITERAL #smallIntLiteral
- | MINUS? TINYINT_LITERAL #tinyIntLiteral
- | MINUS? DOUBLE_LITERAL #doubleLiteral
- | MINUS? BIGDECIMAL_LITERAL #bigDecimalLiteral
- ;
-
identifier
: strictIdentifier
;
@@ -136,7 +106,6 @@ BY: 'BY';
FALSE: 'FALSE';
DATE: 'DATE';
INTERVAL: 'INTERVAL';
-NULL: 'NULL';
OPTIMIZE: 'OPTIMIZE';
OR: 'OR';
TABLE: 'TABLE';
@@ -145,22 +114,8 @@ TRUE: 'TRUE';
WHERE: 'WHERE';
ZORDER: 'ZORDER';
-EQ : '=' | '==';
-NSEQ: '<=>';
-NEQ : '<>';
-NEQJ: '!=';
-LT : '<';
-LTE : '<=' | '!>';
-GT : '>';
-GTE : '>=' | '!<';
-
MINUS: '-';
-STRING
- : '\'' ( ~('\''|'\\') | ('\\' .) )* '\''
- | '"' ( ~('"'|'\\') | ('\\' .) )* '"'
- ;
-
BIGINT_LITERAL
: DIGIT+ 'L'
;
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
index 4df924b519f..6f45dae126e 100644
--- a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
+++ b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
@@ -17,6 +17,7 @@
package org.apache.kyuubi.sql
+import org.apache.spark.network.util.ByteUnit
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.internal.SQLConf._
@@ -33,7 +34,8 @@ object KyuubiSQLConf {
buildConf("spark.sql.optimizer.insertRepartitionNum")
.doc(s"The partition number if ${INSERT_REPARTITION_BEFORE_WRITE.key} is enabled. " +
s"If AQE is disabled, the default value is ${SQLConf.SHUFFLE_PARTITIONS.key}. " +
- "If AQE is enabled, the default value is none that means depend on AQE.")
+ "If AQE is enabled, the default value is none that means depend on AQE. " +
+ "This config is used for Spark 3.1 only.")
.version("1.2.0")
.intConf
.createOptional
@@ -138,13 +140,23 @@ object KyuubiSQLConf {
val WATCHDOG_MAX_PARTITIONS =
buildConf("spark.sql.watchdog.maxPartitions")
.doc("Set the max partition number when spark scans a data source. " +
- "Enable MaxPartitionStrategy by specifying this configuration. " +
+ "Enable maxPartitions Strategy by specifying this configuration. " +
"Add maxPartitions Strategy to avoid scan excessive partitions " +
"on partitioned table, it's optional that works with defined")
.version("1.4.0")
.intConf
.createOptional
+ val WATCHDOG_MAX_FILE_SIZE =
+ buildConf("spark.sql.watchdog.maxFileSize")
+ .doc("Set the maximum size in bytes of files when spark scans a data source. " +
+ "Enable maxFileSize Strategy by specifying this configuration. " +
+ "Add maxFileSize Strategy to avoid scan excessive size of files," +
+ " it's optional that works with defined")
+ .version("1.8.0")
+ .bytesConf(ByteUnit.BYTE)
+ .createOptional
+
val WATCHDOG_FORCED_MAXOUTPUTROWS =
buildConf("spark.sql.watchdog.forcedMaxOutputRows")
.doc("Add ForcedMaxOutputRows rule to avoid huge output rows of non-limit query " +
@@ -198,6 +210,21 @@ object KyuubiSQLConf {
.booleanConf
.createWithDefault(false)
+ val FINAL_WRITE_STAGE_EAGERLY_KILL_EXECUTORS_KILL_ALL =
+ buildConf("spark.sql.finalWriteStage.eagerlyKillExecutors.killAll")
+ .doc("When true, eagerly kill all executors before running final write stage. " +
+ "Mainly for test.")
+ .version("1.8.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val FINAL_WRITE_STAGE_SKIP_KILLING_EXECUTORS_FOR_TABLE_CACHE =
+ buildConf("spark.sql.finalWriteStage.skipKillingExecutorsForTableCache")
+ .doc("When true, skip killing executors if the plan has table caches.")
+ .version("1.8.0")
+ .booleanConf
+ .createWithDefault(true)
+
val FINAL_WRITE_STAGE_PARTITION_FACTOR =
buildConf("spark.sql.finalWriteStage.retainExecutorsFactor")
.doc("If the target executors * factor < active executors, and " +
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLAstBuilder.scala b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLAstBuilder.scala
index 9f1958b0905..cc00bf88e94 100644
--- a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLAstBuilder.scala
+++ b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSparkSQLAstBuilder.scala
@@ -17,37 +17,81 @@
package org.apache.kyuubi.sql
-import java.time.LocalDate
-import java.util.Locale
-
import scala.collection.JavaConverters.asScalaBufferConverter
-import scala.collection.mutable.{ArrayBuffer, ListBuffer}
-import scala.util.control.NonFatal
+import scala.collection.mutable.ListBuffer
import org.antlr.v4.runtime.ParserRuleContext
-import org.antlr.v4.runtime.tree.{ParseTree, TerminalNode}
-import org.apache.commons.codec.binary.Hex
-import org.apache.spark.sql.AnalysisException
+import org.antlr.v4.runtime.misc.Interval
+import org.antlr.v4.runtime.tree.ParseTree
+import org.apache.spark.sql.catalyst.SQLConfHelper
import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, UnresolvedRelation, UnresolvedStar}
import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.parser.ParseException
-import org.apache.spark.sql.catalyst.parser.ParserUtils.{string, stringWithoutUnescape, withOrigin}
+import org.apache.spark.sql.catalyst.parser.ParserUtils.withOrigin
import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project, Sort}
-import org.apache.spark.sql.catalyst.util.DateTimeUtils.{getZoneId, localDateToDays, stringToTimestamp}
-import org.apache.spark.sql.catalyst.util.IntervalUtils
-import org.apache.spark.sql.hive.HiveAnalysis.conf
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types._
-import org.apache.spark.unsafe.types.UTF8String
import org.apache.kyuubi.sql.KyuubiSparkSQLParser._
-import org.apache.kyuubi.sql.zorder.{OptimizeZorderStatement, OptimizeZorderStatementBase, Zorder, ZorderBase}
+import org.apache.kyuubi.sql.zorder.{OptimizeZorderStatement, Zorder}
+
+class KyuubiSparkSQLAstBuilder extends KyuubiSparkSQLBaseVisitor[AnyRef] with SQLConfHelper {
+
+ def buildOptimizeStatement(
+ unparsedPredicateOptimize: UnparsedPredicateOptimize,
+ parseExpression: String => Expression): LogicalPlan = {
-abstract class KyuubiSparkSQLAstBuilderBase extends KyuubiSparkSQLBaseVisitor[AnyRef] {
- def buildZorder(child: Seq[Expression]): ZorderBase
- def buildOptimizeZorderStatement(
- tableIdentifier: Seq[String],
- query: LogicalPlan): OptimizeZorderStatementBase
+ val UnparsedPredicateOptimize(tableIdent, tablePredicate, orderExpr) =
+ unparsedPredicateOptimize
+
+ val predicate = tablePredicate.map(parseExpression)
+ verifyPartitionPredicates(predicate)
+ val table = UnresolvedRelation(tableIdent)
+ val tableWithFilter = predicate match {
+ case Some(expr) => Filter(expr, table)
+ case None => table
+ }
+ val query =
+ Sort(
+ SortOrder(orderExpr, Ascending, NullsLast, Seq.empty) :: Nil,
+ conf.getConf(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED),
+ Project(Seq(UnresolvedStar(None)), tableWithFilter))
+ OptimizeZorderStatement(tableIdent, query)
+ }
+
+ private def verifyPartitionPredicates(predicates: Option[Expression]): Unit = {
+ predicates.foreach {
+ case p if !isLikelySelective(p) =>
+ throw new KyuubiSQLExtensionException(s"unsupported partition predicates: ${p.sql}")
+ case _ =>
+ }
+ }
+
+ /**
+ * Forked from Apache Spark's org.apache.spark.sql.catalyst.expressions.PredicateHelper
+ * The `PredicateHelper.isLikelySelective()` is available since Spark-3.3, forked for Spark
+ * that is lower than 3.3.
+ *
+ * Returns whether an expression is likely to be selective
+ */
+ private def isLikelySelective(e: Expression): Boolean = e match {
+ case Not(expr) => isLikelySelective(expr)
+ case And(l, r) => isLikelySelective(l) || isLikelySelective(r)
+ case Or(l, r) => isLikelySelective(l) && isLikelySelective(r)
+ case _: StringRegexExpression => true
+ case _: BinaryComparison => true
+ case _: In | _: InSet => true
+ case _: StringPredicate => true
+ case BinaryPredicate(_) => true
+ case _: MultiLikeBase => true
+ case _ => false
+ }
+
+ private object BinaryPredicate {
+ def unapply(expr: Expression): Option[Expression] = expr match {
+ case _: Contains => Option(expr)
+ case _: StartsWith => Option(expr)
+ case _: EndsWith => Option(expr)
+ case _ => None
+ }
+ }
/**
* Create an expression from the given context. This method just passes the context on to the
@@ -62,21 +106,12 @@ abstract class KyuubiSparkSQLAstBuilderBase extends KyuubiSparkSQLBaseVisitor[An
}
override def visitOptimizeZorder(
- ctx: OptimizeZorderContext): LogicalPlan = withOrigin(ctx) {
+ ctx: OptimizeZorderContext): UnparsedPredicateOptimize = withOrigin(ctx) {
val tableIdent = multiPart(ctx.multipartIdentifier())
- val table = UnresolvedRelation(tableIdent)
-
- val whereClause =
- if (ctx.whereClause() == null) {
- None
- } else {
- Option(expression(ctx.whereClause().booleanExpression()))
- }
- val tableWithFilter = whereClause match {
- case Some(expr) => Filter(expr, table)
- case None => table
- }
+ val predicate = Option(ctx.whereClause())
+ .map(_.partitionPredicate)
+ .map(extractRawText(_))
val zorderCols = ctx.zorderClause().order.asScala
.map(visitMultipartIdentifier)
@@ -87,364 +122,53 @@ abstract class KyuubiSparkSQLAstBuilderBase extends KyuubiSparkSQLBaseVisitor[An
if (zorderCols.length == 1) {
zorderCols.head
} else {
- buildZorder(zorderCols)
+ Zorder(zorderCols)
}
- val query =
- Sort(
- SortOrder(orderExpr, Ascending, NullsLast, Seq.empty) :: Nil,
- conf.getConf(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED),
- Project(Seq(UnresolvedStar(None)), tableWithFilter))
-
- buildOptimizeZorderStatement(tableIdent, query)
+ UnparsedPredicateOptimize(tableIdent, predicate, orderExpr)
}
override def visitPassThrough(ctx: PassThroughContext): LogicalPlan = null
- override def visitQuery(ctx: QueryContext): Expression = withOrigin(ctx) {
- val left = new UnresolvedAttribute(multiPart(ctx.multipartIdentifier()))
- val right = expression(ctx.constant())
- val operator = ctx.comparisonOperator().getChild(0).asInstanceOf[TerminalNode]
- operator.getSymbol.getType match {
- case KyuubiSparkSQLParser.EQ =>
- EqualTo(left, right)
- case KyuubiSparkSQLParser.NSEQ =>
- EqualNullSafe(left, right)
- case KyuubiSparkSQLParser.NEQ | KyuubiSparkSQLParser.NEQJ =>
- Not(EqualTo(left, right))
- case KyuubiSparkSQLParser.LT =>
- LessThan(left, right)
- case KyuubiSparkSQLParser.LTE =>
- LessThanOrEqual(left, right)
- case KyuubiSparkSQLParser.GT =>
- GreaterThan(left, right)
- case KyuubiSparkSQLParser.GTE =>
- GreaterThanOrEqual(left, right)
- }
- }
-
- override def visitLogicalBinary(ctx: LogicalBinaryContext): Expression = withOrigin(ctx) {
- val expressionType = ctx.operator.getType
- val expressionCombiner = expressionType match {
- case KyuubiSparkSQLParser.AND => And.apply _
- case KyuubiSparkSQLParser.OR => Or.apply _
- }
-
- // Collect all similar left hand contexts.
- val contexts = ArrayBuffer(ctx.right)
- var current = ctx.left
- def collectContexts: Boolean = current match {
- case lbc: LogicalBinaryContext if lbc.operator.getType == expressionType =>
- contexts += lbc.right
- current = lbc.left
- true
- case _ =>
- contexts += current
- false
- }
- while (collectContexts) {
- // No body - all updates take place in the collectContexts.
- }
-
- // Reverse the contexts to have them in the same sequence as in the SQL statement & turn them
- // into expressions.
- val expressions = contexts.reverseMap(expression)
-
- // Create a balanced tree.
- def reduceToExpressionTree(low: Int, high: Int): Expression = high - low match {
- case 0 =>
- expressions(low)
- case 1 =>
- expressionCombiner(expressions(low), expressions(high))
- case x =>
- val mid = low + x / 2
- expressionCombiner(
- reduceToExpressionTree(low, mid),
- reduceToExpressionTree(mid + 1, high))
- }
- reduceToExpressionTree(0, expressions.size - 1)
- }
-
override def visitMultipartIdentifier(ctx: MultipartIdentifierContext): Seq[String] =
withOrigin(ctx) {
- ctx.parts.asScala.map(_.getText)
+ ctx.parts.asScala.map(_.getText).toSeq
}
override def visitZorderClause(ctx: ZorderClauseContext): Seq[UnresolvedAttribute] =
withOrigin(ctx) {
val res = ListBuffer[UnresolvedAttribute]()
ctx.multipartIdentifier().forEach { identifier =>
- res += UnresolvedAttribute(identifier.parts.asScala.map(_.getText))
+ res += UnresolvedAttribute(identifier.parts.asScala.map(_.getText).toSeq)
}
- res
- }
-
- /**
- * Create a NULL literal expression.
- */
- override def visitNullLiteral(ctx: NullLiteralContext): Literal = withOrigin(ctx) {
- Literal(null)
- }
-
- /**
- * Create a Boolean literal expression.
- */
- override def visitBooleanLiteral(ctx: BooleanLiteralContext): Literal = withOrigin(ctx) {
- if (ctx.getText.toBoolean) {
- Literal.TrueLiteral
- } else {
- Literal.FalseLiteral
+ res.toSeq
}
- }
-
- /**
- * Create a typed Literal expression. A typed literal has the following SQL syntax:
- * {{{
- * [TYPE] '[VALUE]'
- * }}}
- * Currently Date, Timestamp, Interval and Binary typed literals are supported.
- */
- override def visitTypeConstructor(ctx: TypeConstructorContext): Literal = withOrigin(ctx) {
- val value = string(ctx.STRING)
- val valueType = ctx.identifier.getText.toUpperCase(Locale.ROOT)
-
- def toLiteral[T](f: UTF8String => Option[T], t: DataType): Literal = {
- f(UTF8String.fromString(value)).map(Literal(_, t)).getOrElse {
- throw new ParseException(s"Cannot parse the $valueType value: $value", ctx)
- }
- }
- try {
- valueType match {
- case "DATE" =>
- toLiteral(stringToDate, DateType)
- case "TIMESTAMP" =>
- val zoneId = getZoneId(SQLConf.get.sessionLocalTimeZone)
- toLiteral(stringToTimestamp(_, zoneId), TimestampType)
- case "INTERVAL" =>
- val interval =
- try {
- IntervalUtils.stringToInterval(UTF8String.fromString(value))
- } catch {
- case e: IllegalArgumentException =>
- val ex = new ParseException("Cannot parse the INTERVAL value: " + value, ctx)
- ex.setStackTrace(e.getStackTrace)
- throw ex
- }
- Literal(interval, CalendarIntervalType)
- case "X" =>
- val padding = if (value.length % 2 != 0) "0" else ""
-
- Literal(Hex.decodeHex(padding + value))
- case other =>
- throw new ParseException(s"Literals of type '$other' are currently not supported.", ctx)
- }
- } catch {
- case e: IllegalArgumentException =>
- val message = Option(e.getMessage).getOrElse(s"Exception parsing $valueType")
- throw new ParseException(message, ctx)
- }
- }
-
- /**
- * Create a String literal expression.
- */
- override def visitStringLiteral(ctx: StringLiteralContext): Literal = withOrigin(ctx) {
- Literal(createString(ctx))
- }
-
- /**
- * Create a decimal literal for a regular decimal number.
- */
- override def visitDecimalLiteral(ctx: DecimalLiteralContext): Literal = withOrigin(ctx) {
- Literal(BigDecimal(ctx.getText).underlying())
- }
-
- /** Create a numeric literal expression. */
- private def numericLiteral(
- ctx: NumberContext,
- rawStrippedQualifier: String,
- minValue: BigDecimal,
- maxValue: BigDecimal,
- typeName: String)(converter: String => Any): Literal = withOrigin(ctx) {
- try {
- val rawBigDecimal = BigDecimal(rawStrippedQualifier)
- if (rawBigDecimal < minValue || rawBigDecimal > maxValue) {
- throw new ParseException(
- s"Numeric literal ${rawStrippedQualifier} does not " +
- s"fit in range [${minValue}, ${maxValue}] for type ${typeName}",
- ctx)
- }
- Literal(converter(rawStrippedQualifier))
- } catch {
- case e: NumberFormatException =>
- throw new ParseException(e.getMessage, ctx)
- }
- }
-
- /**
- * Create a Byte Literal expression.
- */
- override def visitTinyIntLiteral(ctx: TinyIntLiteralContext): Literal = {
- val rawStrippedQualifier = ctx.getText.substring(0, ctx.getText.length - 1)
- numericLiteral(
- ctx,
- rawStrippedQualifier,
- Byte.MinValue,
- Byte.MaxValue,
- ByteType.simpleString)(_.toByte)
- }
-
- /**
- * Create an integral literal expression. The code selects the most narrow integral type
- * possible, either a BigDecimal, a Long or an Integer is returned.
- */
- override def visitIntegerLiteral(ctx: IntegerLiteralContext): Literal = withOrigin(ctx) {
- BigDecimal(ctx.getText) match {
- case v if v.isValidInt =>
- Literal(v.intValue)
- case v if v.isValidLong =>
- Literal(v.longValue)
- case v => Literal(v.underlying())
- }
- }
-
- /**
- * Create a Short Literal expression.
- */
- override def visitSmallIntLiteral(ctx: SmallIntLiteralContext): Literal = {
- val rawStrippedQualifier = ctx.getText.substring(0, ctx.getText.length - 1)
- numericLiteral(
- ctx,
- rawStrippedQualifier,
- Short.MinValue,
- Short.MaxValue,
- ShortType.simpleString)(_.toShort)
- }
-
- /**
- * Create a Long Literal expression.
- */
- override def visitBigIntLiteral(ctx: BigIntLiteralContext): Literal = {
- val rawStrippedQualifier = ctx.getText.substring(0, ctx.getText.length - 1)
- numericLiteral(
- ctx,
- rawStrippedQualifier,
- Long.MinValue,
- Long.MaxValue,
- LongType.simpleString)(_.toLong)
- }
-
- /**
- * Create a Double Literal expression.
- */
- override def visitDoubleLiteral(ctx: DoubleLiteralContext): Literal = {
- val rawStrippedQualifier = ctx.getText.substring(0, ctx.getText.length - 1)
- numericLiteral(
- ctx,
- rawStrippedQualifier,
- Double.MinValue,
- Double.MaxValue,
- DoubleType.simpleString)(_.toDouble)
- }
-
- /**
- * Create a BigDecimal Literal expression.
- */
- override def visitBigDecimalLiteral(ctx: BigDecimalLiteralContext): Literal = {
- val raw = ctx.getText.substring(0, ctx.getText.length - 2)
- try {
- Literal(BigDecimal(raw).underlying())
- } catch {
- case e: AnalysisException =>
- throw new ParseException(e.message, ctx)
- }
- }
-
- /**
- * Create a String from a string literal context. This supports multiple consecutive string
- * literals, these are concatenated, for example this expression "'hello' 'world'" will be
- * converted into "helloworld".
- *
- * Special characters can be escaped by using Hive/C-style escaping.
- */
- private def createString(ctx: StringLiteralContext): String = {
- if (conf.escapedStringLiterals) {
- ctx.STRING().asScala.map(stringWithoutUnescape).mkString
- } else {
- ctx.STRING().asScala.map(string).mkString
- }
- }
private def typedVisit[T](ctx: ParseTree): T = {
ctx.accept(this).asInstanceOf[T]
}
- private def stringToDate(s: UTF8String): Option[Int] = {
- def isValidDigits(segment: Int, digits: Int): Boolean = {
- // An integer is able to represent a date within [+-]5 million years.
- var maxDigitsYear = 7
- (segment == 0 && digits >= 4 && digits <= maxDigitsYear) ||
- (segment != 0 && digits > 0 && digits <= 2)
- }
- if (s == null || s.trimAll().numBytes() == 0) {
- return None
- }
- val segments: Array[Int] = Array[Int](1, 1, 1)
- var sign = 1
- var i = 0
- var currentSegmentValue = 0
- var currentSegmentDigits = 0
- val bytes = s.trimAll().getBytes
- var j = 0
- if (bytes(j) == '-' || bytes(j) == '+') {
- sign = if (bytes(j) == '-') -1 else 1
- j += 1
- }
- while (j < bytes.length && (i < 3 && !(bytes(j) == ' ' || bytes(j) == 'T'))) {
- val b = bytes(j)
- if (i < 2 && b == '-') {
- if (!isValidDigits(i, currentSegmentDigits)) {
- return None
- }
- segments(i) = currentSegmentValue
- currentSegmentValue = 0
- currentSegmentDigits = 0
- i += 1
- } else {
- val parsedValue = b - '0'.toByte
- if (parsedValue < 0 || parsedValue > 9) {
- return None
- } else {
- currentSegmentValue = currentSegmentValue * 10 + parsedValue
- currentSegmentDigits += 1
- }
- }
- j += 1
- }
- if (!isValidDigits(i, currentSegmentDigits)) {
- return None
- }
- if (i < 2 && j < bytes.length) {
- // For the `yyyy` and `yyyy-[m]m` formats, entire input must be consumed.
- return None
- }
- segments(i) = currentSegmentValue
- try {
- val localDate = LocalDate.of(sign * segments(0), segments(1), segments(2))
- Some(localDateToDays(localDate))
- } catch {
- case NonFatal(_) => None
- }
+ private def extractRawText(exprContext: ParserRuleContext): String = {
+ // Extract the raw expression which will be parsed later
+ exprContext.getStart.getInputStream.getText(new Interval(
+ exprContext.getStart.getStartIndex,
+ exprContext.getStop.getStopIndex))
}
}
-class KyuubiSparkSQLAstBuilder extends KyuubiSparkSQLAstBuilderBase {
- override def buildZorder(child: Seq[Expression]): ZorderBase = {
- Zorder(child)
- }
+/**
+ * a logical plan contains an unparsed expression that will be parsed by spark.
+ */
+trait UnparsedExpressionLogicalPlan extends LogicalPlan {
+ override def output: Seq[Attribute] = throw new UnsupportedOperationException()
- override def buildOptimizeZorderStatement(
- tableIdentifier: Seq[String],
- query: LogicalPlan): OptimizeZorderStatementBase = {
- OptimizeZorderStatement(tableIdentifier, query)
- }
+ override def children: Seq[LogicalPlan] = throw new UnsupportedOperationException()
+
+ protected def withNewChildrenInternal(
+ newChildren: IndexedSeq[LogicalPlan]): LogicalPlan =
+ throw new UnsupportedOperationException()
}
+
+case class UnparsedPredicateOptimize(
+ tableIdent: Seq[String],
+ tablePredicate: Option[String],
+ orderExpr: Expression) extends UnparsedExpressionLogicalPlan {}
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/KyuubiWatchDogException.scala b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/KyuubiWatchDogException.scala
index b3c58afdf5a..e44309192a9 100644
--- a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/KyuubiWatchDogException.scala
+++ b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/KyuubiWatchDogException.scala
@@ -23,3 +23,8 @@ final class MaxPartitionExceedException(
private val reason: String = "",
private val cause: Throwable = None.orNull)
extends KyuubiSQLExtensionException(reason, cause)
+
+final class MaxFileSizeExceedException(
+ private val reason: String = "",
+ private val cause: Throwable = None.orNull)
+ extends KyuubiSQLExtensionException(reason, cause)
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxPartitionStrategy.scala b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxPartitionStrategy.scala
deleted file mode 100644
index 61ab07adfb1..00000000000
--- a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxPartitionStrategy.scala
+++ /dev/null
@@ -1,185 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.sql.watchdog
-
-import org.apache.hadoop.fs.Path
-import org.apache.spark.sql.{PruneFileSourcePartitionHelper, SparkSession, Strategy}
-import org.apache.spark.sql.catalyst.SQLConfHelper
-import org.apache.spark.sql.catalyst.catalog.{CatalogTable, HiveTableRelation}
-import org.apache.spark.sql.catalyst.planning.ScanOperation
-import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
-import org.apache.spark.sql.execution.SparkPlan
-import org.apache.spark.sql.execution.datasources.{CatalogFileIndex, HadoopFsRelation, InMemoryFileIndex, LogicalRelation}
-import org.apache.spark.sql.types.StructType
-
-import org.apache.kyuubi.sql.KyuubiSQLConf
-
-/**
- * Add maxPartitions Strategy to avoid scan excessive partitions on partitioned table
- * 1 Check if scan exceed maxPartition
- * 2 Check if Using partitionFilter on partitioned table
- * This Strategy Add Planner Strategy after LogicalOptimizer
- */
-case class MaxPartitionStrategy(session: SparkSession)
- extends Strategy
- with SQLConfHelper
- with PruneFileSourcePartitionHelper {
- override def apply(plan: LogicalPlan): Seq[SparkPlan] = {
- val maxScanPartitionsOpt = conf.getConf(KyuubiSQLConf.WATCHDOG_MAX_PARTITIONS)
-
- if (maxScanPartitionsOpt.isDefined) {
- checkRelationMaxPartitions(plan, maxScanPartitionsOpt.get)
- }
- Nil
- }
-
- private def checkRelationMaxPartitions(
- plan: LogicalPlan,
- maxScanPartitions: Int): Unit = {
- plan match {
- case ScanOperation(_, _, relation: HiveTableRelation) if relation.isPartitioned =>
- relation.prunedPartitions match {
- case Some(prunedPartitions) =>
- if (prunedPartitions.size > maxScanPartitions) {
- throw new MaxPartitionExceedException(
- s"""
- |SQL job scan hive partition: ${prunedPartitions.size}
- |exceed restrict of hive scan maxPartition $maxScanPartitions
- |You should optimize your SQL logical according partition structure
- |or shorten query scope such as p_date, detail as below:
- |Table: ${relation.tableMeta.qualifiedName}
- |Owner: ${relation.tableMeta.owner}
- |Partition Structure: ${relation.partitionCols.map(_.name).mkString(", ")}
- |""".stripMargin)
- }
- case _ =>
- val totalPartitions = session
- .sessionState.catalog.externalCatalog.listPartitionNames(
- relation.tableMeta.database,
- relation.tableMeta.identifier.table)
- if (totalPartitions.size > maxScanPartitions) {
- throw new MaxPartitionExceedException(
- s"""
- |Your SQL job scan a whole huge table without any partition filter,
- |You should optimize your SQL logical according partition structure
- |or shorten query scope such as p_date, detail as below:
- |Table: ${relation.tableMeta.qualifiedName}
- |Owner: ${relation.tableMeta.owner}
- |Partition Structure: ${relation.partitionCols.map(_.name).mkString(", ")}
- |""".stripMargin)
- }
- }
- case ScanOperation(
- _,
- filters,
- relation @ LogicalRelation(
- fsRelation @ HadoopFsRelation(
- fileIndex: InMemoryFileIndex,
- partitionSchema,
- _,
- _,
- _,
- _),
- _,
- _,
- _)) if fsRelation.partitionSchema.nonEmpty =>
- val (partitionKeyFilters, dataFilter) =
- getPartitionKeyFiltersAndDataFilters(
- fsRelation.sparkSession,
- relation,
- partitionSchema,
- filters,
- relation.output)
- val prunedPartitionSize = fileIndex.listFiles(
- partitionKeyFilters.toSeq,
- dataFilter)
- .size
- if (prunedPartitionSize > maxScanPartitions) {
- throw maxPartitionExceedError(
- prunedPartitionSize,
- maxScanPartitions,
- relation.catalogTable,
- fileIndex.rootPaths,
- fsRelation.partitionSchema)
- }
- case ScanOperation(
- _,
- filters,
- logicalRelation @ LogicalRelation(
- fsRelation @ HadoopFsRelation(
- catalogFileIndex: CatalogFileIndex,
- partitionSchema,
- _,
- _,
- _,
- _),
- _,
- _,
- _)) if fsRelation.partitionSchema.nonEmpty =>
- val (partitionKeyFilters, _) =
- getPartitionKeyFiltersAndDataFilters(
- fsRelation.sparkSession,
- logicalRelation,
- partitionSchema,
- filters,
- logicalRelation.output)
-
- val prunedPartitionSize =
- catalogFileIndex.filterPartitions(
- partitionKeyFilters.toSeq)
- .partitionSpec()
- .partitions
- .size
- if (prunedPartitionSize > maxScanPartitions) {
- throw maxPartitionExceedError(
- prunedPartitionSize,
- maxScanPartitions,
- logicalRelation.catalogTable,
- catalogFileIndex.rootPaths,
- fsRelation.partitionSchema)
- }
- case _ =>
- }
- }
-
- def maxPartitionExceedError(
- prunedPartitionSize: Int,
- maxPartitionSize: Int,
- tableMeta: Option[CatalogTable],
- rootPaths: Seq[Path],
- partitionSchema: StructType): Throwable = {
- val truncatedPaths =
- if (rootPaths.length > 5) {
- rootPaths.slice(0, 5).mkString(",") + """... """ + (rootPaths.length - 5) + " more paths"
- } else {
- rootPaths.mkString(",")
- }
-
- new MaxPartitionExceedException(
- s"""
- |SQL job scan data source partition: $prunedPartitionSize
- |exceed restrict of data source scan maxPartition $maxPartitionSize
- |You should optimize your SQL logical according partition structure
- |or shorten query scope such as p_date, detail as below:
- |Table: ${tableMeta.map(_.qualifiedName).getOrElse("")}
- |Owner: ${tableMeta.map(_.owner).getOrElse("")}
- |RootPaths: $truncatedPaths
- |Partition Structure: ${partitionSchema.map(_.name).mkString(", ")}
- |""".stripMargin)
- }
-}
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
new file mode 100644
index 00000000000..0ee693fcbec
--- /dev/null
+++ b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
@@ -0,0 +1,303 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.sql.watchdog
+
+import org.apache.hadoop.fs.Path
+import org.apache.spark.sql.{PruneFileSourcePartitionHelper, SparkSession, Strategy}
+import org.apache.spark.sql.catalyst.SQLConfHelper
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, HiveTableRelation}
+import org.apache.spark.sql.catalyst.planning.ScanOperation
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.{CatalogFileIndex, HadoopFsRelation, InMemoryFileIndex, LogicalRelation}
+import org.apache.spark.sql.types.StructType
+
+import org.apache.kyuubi.sql.KyuubiSQLConf
+
+/**
+ * Add MaxScanStrategy to avoid scan excessive partitions or files
+ * 1. Check if scan exceed maxPartition of partitioned table
+ * 2. Check if scan exceed maxFileSize (calculated by hive table and partition statistics)
+ * This Strategy Add Planner Strategy after LogicalOptimizer
+ * @param session
+ */
+case class MaxScanStrategy(session: SparkSession)
+ extends Strategy
+ with SQLConfHelper
+ with PruneFileSourcePartitionHelper {
+ override def apply(plan: LogicalPlan): Seq[SparkPlan] = {
+ val maxScanPartitionsOpt = conf.getConf(KyuubiSQLConf.WATCHDOG_MAX_PARTITIONS)
+ val maxFileSizeOpt = conf.getConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE)
+ if (maxScanPartitionsOpt.isDefined || maxFileSizeOpt.isDefined) {
+ checkScan(plan, maxScanPartitionsOpt, maxFileSizeOpt)
+ }
+ Nil
+ }
+
+ private def checkScan(
+ plan: LogicalPlan,
+ maxScanPartitionsOpt: Option[Int],
+ maxFileSizeOpt: Option[Long]): Unit = {
+ plan match {
+ case ScanOperation(_, _, relation: HiveTableRelation) =>
+ if (relation.isPartitioned) {
+ relation.prunedPartitions match {
+ case Some(prunedPartitions) =>
+ if (maxScanPartitionsOpt.exists(_ < prunedPartitions.size)) {
+ throw new MaxPartitionExceedException(
+ s"""
+ |SQL job scan hive partition: ${prunedPartitions.size}
+ |exceed restrict of hive scan maxPartition ${maxScanPartitionsOpt.get}
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${relation.tableMeta.qualifiedName}
+ |Owner: ${relation.tableMeta.owner}
+ |Partition Structure: ${relation.partitionCols.map(_.name).mkString(", ")}
+ |""".stripMargin)
+ }
+ lazy val scanFileSize = prunedPartitions.flatMap(_.stats).map(_.sizeInBytes).sum
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw partTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ Some(relation.tableMeta),
+ prunedPartitions.flatMap(_.storage.locationUri).map(_.toString),
+ relation.partitionCols.map(_.name))
+ }
+ case _ =>
+ lazy val scanPartitions: Int = session
+ .sessionState.catalog.externalCatalog.listPartitionNames(
+ relation.tableMeta.database,
+ relation.tableMeta.identifier.table).size
+ if (maxScanPartitionsOpt.exists(_ < scanPartitions)) {
+ throw new MaxPartitionExceedException(
+ s"""
+ |Your SQL job scan a whole huge table without any partition filter,
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${relation.tableMeta.qualifiedName}
+ |Owner: ${relation.tableMeta.owner}
+ |Partition Structure: ${relation.partitionCols.map(_.name).mkString(", ")}
+ |""".stripMargin)
+ }
+
+ lazy val scanFileSize: BigInt =
+ relation.tableMeta.stats.map(_.sizeInBytes).getOrElse {
+ session
+ .sessionState.catalog.externalCatalog.listPartitions(
+ relation.tableMeta.database,
+ relation.tableMeta.identifier.table).flatMap(_.stats).map(_.sizeInBytes).sum
+ }
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw new MaxFileSizeExceedException(
+ s"""
+ |Your SQL job scan a whole huge table without any partition filter,
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${relation.tableMeta.qualifiedName}
+ |Owner: ${relation.tableMeta.owner}
+ |Partition Structure: ${relation.partitionCols.map(_.name).mkString(", ")}
+ |""".stripMargin)
+ }
+ }
+ } else {
+ lazy val scanFileSize = relation.tableMeta.stats.map(_.sizeInBytes).sum
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw nonPartTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ Some(relation.tableMeta))
+ }
+ }
+ case ScanOperation(
+ _,
+ filters,
+ relation @ LogicalRelation(
+ fsRelation @ HadoopFsRelation(
+ fileIndex: InMemoryFileIndex,
+ partitionSchema,
+ _,
+ _,
+ _,
+ _),
+ _,
+ _,
+ _)) =>
+ if (fsRelation.partitionSchema.nonEmpty) {
+ val (partitionKeyFilters, dataFilter) =
+ getPartitionKeyFiltersAndDataFilters(
+ fsRelation.sparkSession,
+ relation,
+ partitionSchema,
+ filters,
+ relation.output)
+ val prunedPartitions = fileIndex.listFiles(
+ partitionKeyFilters.toSeq,
+ dataFilter)
+ if (maxScanPartitionsOpt.exists(_ < prunedPartitions.size)) {
+ throw maxPartitionExceedError(
+ prunedPartitions.size,
+ maxScanPartitionsOpt.get,
+ relation.catalogTable,
+ fileIndex.rootPaths,
+ fsRelation.partitionSchema)
+ }
+ lazy val scanFileSize = prunedPartitions.flatMap(_.files).map(_.getLen).sum
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw partTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ relation.catalogTable,
+ fileIndex.rootPaths.map(_.toString),
+ fsRelation.partitionSchema.map(_.name))
+ }
+ } else {
+ lazy val scanFileSize = fileIndex.sizeInBytes
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw nonPartTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ relation.catalogTable)
+ }
+ }
+ case ScanOperation(
+ _,
+ filters,
+ logicalRelation @ LogicalRelation(
+ fsRelation @ HadoopFsRelation(
+ catalogFileIndex: CatalogFileIndex,
+ partitionSchema,
+ _,
+ _,
+ _,
+ _),
+ _,
+ _,
+ _)) =>
+ if (fsRelation.partitionSchema.nonEmpty) {
+ val (partitionKeyFilters, _) =
+ getPartitionKeyFiltersAndDataFilters(
+ fsRelation.sparkSession,
+ logicalRelation,
+ partitionSchema,
+ filters,
+ logicalRelation.output)
+
+ val fileIndex = catalogFileIndex.filterPartitions(
+ partitionKeyFilters.toSeq)
+
+ lazy val prunedPartitionSize = fileIndex.partitionSpec().partitions.size
+ if (maxScanPartitionsOpt.exists(_ < prunedPartitionSize)) {
+ throw maxPartitionExceedError(
+ prunedPartitionSize,
+ maxScanPartitionsOpt.get,
+ logicalRelation.catalogTable,
+ catalogFileIndex.rootPaths,
+ fsRelation.partitionSchema)
+ }
+
+ lazy val scanFileSize = fileIndex
+ .listFiles(Nil, Nil).flatMap(_.files).map(_.getLen).sum
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw partTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ logicalRelation.catalogTable,
+ catalogFileIndex.rootPaths.map(_.toString),
+ fsRelation.partitionSchema.map(_.name))
+ }
+ } else {
+ lazy val scanFileSize = catalogFileIndex.sizeInBytes
+ if (maxFileSizeOpt.exists(_ < scanFileSize)) {
+ throw nonPartTableMaxFileExceedError(
+ scanFileSize,
+ maxFileSizeOpt.get,
+ logicalRelation.catalogTable)
+ }
+ }
+ case _ =>
+ }
+ }
+
+ def maxPartitionExceedError(
+ prunedPartitionSize: Int,
+ maxPartitionSize: Int,
+ tableMeta: Option[CatalogTable],
+ rootPaths: Seq[Path],
+ partitionSchema: StructType): Throwable = {
+ val truncatedPaths =
+ if (rootPaths.length > 5) {
+ rootPaths.slice(0, 5).mkString(",") + """... """ + (rootPaths.length - 5) + " more paths"
+ } else {
+ rootPaths.mkString(",")
+ }
+
+ new MaxPartitionExceedException(
+ s"""
+ |SQL job scan data source partition: $prunedPartitionSize
+ |exceed restrict of data source scan maxPartition $maxPartitionSize
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${tableMeta.map(_.qualifiedName).getOrElse("")}
+ |Owner: ${tableMeta.map(_.owner).getOrElse("")}
+ |RootPaths: $truncatedPaths
+ |Partition Structure: ${partitionSchema.map(_.name).mkString(", ")}
+ |""".stripMargin)
+ }
+
+ private def partTableMaxFileExceedError(
+ scanFileSize: Number,
+ maxFileSize: Long,
+ tableMeta: Option[CatalogTable],
+ rootPaths: Seq[String],
+ partitions: Seq[String]): Throwable = {
+ val truncatedPaths =
+ if (rootPaths.length > 5) {
+ rootPaths.slice(0, 5).mkString(",") + """... """ + (rootPaths.length - 5) + " more paths"
+ } else {
+ rootPaths.mkString(",")
+ }
+
+ new MaxFileSizeExceedException(
+ s"""
+ |SQL job scan file size in bytes: $scanFileSize
+ |exceed restrict of table scan maxFileSize $maxFileSize
+ |You should optimize your SQL logical according partition structure
+ |or shorten query scope such as p_date, detail as below:
+ |Table: ${tableMeta.map(_.qualifiedName).getOrElse("")}
+ |Owner: ${tableMeta.map(_.owner).getOrElse("")}
+ |RootPaths: $truncatedPaths
+ |Partition Structure: ${partitions.mkString(", ")}
+ |""".stripMargin)
+ }
+
+ private def nonPartTableMaxFileExceedError(
+ scanFileSize: Number,
+ maxFileSize: Long,
+ tableMeta: Option[CatalogTable]): Throwable = {
+ new MaxFileSizeExceedException(
+ s"""
+ |SQL job scan file size in bytes: $scanFileSize
+ |exceed restrict of table scan maxFileSize $maxFileSize
+ |detail as below:
+ |Table: ${tableMeta.map(_.qualifiedName).getOrElse("")}
+ |Owner: ${tableMeta.map(_.owner).getOrElse("")}
+ |Location: ${tableMeta.map(_.location).getOrElse("")}
+ |""".stripMargin)
+ }
+}
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderStatementBase.scala b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderStatementBase.scala
index a9bb5a5d758..895f9e24be3 100644
--- a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderStatementBase.scala
+++ b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/OptimizeZorderStatementBase.scala
@@ -20,24 +20,15 @@ package org.apache.kyuubi.sql.zorder
import org.apache.spark.sql.catalyst.expressions.Attribute
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, UnaryNode}
-/**
- * A zorder statement that contains we parsed from SQL.
- * We should convert this plan to certain command at Analyzer.
- */
-abstract class OptimizeZorderStatementBase extends UnaryNode {
- def tableIdentifier: Seq[String]
- def query: LogicalPlan
- override def child: LogicalPlan = query
- override def output: Seq[Attribute] = child.output
-}
-
/**
* A zorder statement that contains we parsed from SQL.
* We should convert this plan to certain command at Analyzer.
*/
case class OptimizeZorderStatement(
tableIdentifier: Seq[String],
- query: LogicalPlan) extends OptimizeZorderStatementBase {
+ query: LogicalPlan) extends UnaryNode {
+ override def child: LogicalPlan = query
+ override def output: Seq[Attribute] = child.output
protected def withNewChildInternal(newChild: LogicalPlan): LogicalPlan =
copy(query = newChild)
}
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ResolveZorderBase.scala b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ResolveZorderBase.scala
index cdead0b06d2..9f735caa7a7 100644
--- a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ResolveZorderBase.scala
+++ b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/zorder/ResolveZorderBase.scala
@@ -57,7 +57,7 @@ abstract class ResolveZorderBase extends Rule[LogicalPlan] {
}
override def apply(plan: LogicalPlan): LogicalPlan = plan match {
- case statement: OptimizeZorderStatementBase if statement.query.resolved =>
+ case statement: OptimizeZorderStatement if statement.query.resolved =>
checkQueryAllowed(statement.query)
val tableIdentifier = getTableIdentifier(statement.tableIdentifier)
val catalogTable = session.sessionState.catalog.getTableMetadata(tableIdentifier)
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/WatchDogSuiteBase.scala b/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/WatchDogSuiteBase.scala
index e6ecd28c940..a202e813c5e 100644
--- a/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/WatchDogSuiteBase.scala
+++ b/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/WatchDogSuiteBase.scala
@@ -17,10 +17,15 @@
package org.apache.spark.sql
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import org.apache.commons.io.FileUtils
import org.apache.spark.sql.catalyst.plans.logical.{GlobalLimit, LogicalPlan}
import org.apache.kyuubi.sql.KyuubiSQLConf
-import org.apache.kyuubi.sql.watchdog.MaxPartitionExceedException
+import org.apache.kyuubi.sql.watchdog.{MaxFileSizeExceedException, MaxPartitionExceedException}
trait WatchDogSuiteBase extends KyuubiSparkSQLExtensionTest {
override protected def beforeAll(): Unit = {
@@ -371,7 +376,7 @@ trait WatchDogSuiteBase extends KyuubiSparkSQLExtensionTest {
|ORDER BY a
|DESC
|""".stripMargin)
- .collect().head.get(0).equals(10))
+ .collect().head.get(0) === 10)
}
}
}
@@ -477,4 +482,120 @@ trait WatchDogSuiteBase extends KyuubiSparkSQLExtensionTest {
}
}
}
+
+ private def checkMaxFileSize(tableSize: Long, nonPartTableSize: Long): Unit = {
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE.key -> tableSize.toString) {
+ checkAnswer(sql("SELECT count(distinct(p)) FROM test"), Row(10) :: Nil)
+ }
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE.key -> (tableSize / 2).toString) {
+ sql("SELECT * FROM test where p=1").queryExecution.sparkPlan
+
+ sql(s"SELECT * FROM test WHERE p in (${Range(0, 3).toList.mkString(",")})")
+ .queryExecution.sparkPlan
+
+ intercept[MaxFileSizeExceedException](
+ sql("SELECT * FROM test where p != 1").queryExecution.sparkPlan)
+
+ intercept[MaxFileSizeExceedException](
+ sql("SELECT * FROM test").queryExecution.sparkPlan)
+
+ intercept[MaxFileSizeExceedException](sql(
+ s"SELECT * FROM test WHERE p in (${Range(0, 6).toList.mkString(",")})")
+ .queryExecution.sparkPlan)
+ }
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE.key -> nonPartTableSize.toString) {
+ checkAnswer(sql("SELECT count(*) FROM test_non_part"), Row(10000) :: Nil)
+ }
+
+ withSQLConf(KyuubiSQLConf.WATCHDOG_MAX_FILE_SIZE.key -> (nonPartTableSize - 1).toString) {
+ intercept[MaxFileSizeExceedException](
+ sql("SELECT * FROM test_non_part").queryExecution.sparkPlan)
+ }
+ }
+
+ test("watchdog with scan maxFileSize -- hive") {
+ Seq(false).foreach { convertMetastoreParquet =>
+ withTable("test", "test_non_part", "temp") {
+ spark.range(10000).selectExpr("id as col")
+ .createOrReplaceTempView("temp")
+
+ // partitioned table
+ sql(
+ s"""
+ |CREATE TABLE test(i int)
+ |PARTITIONED BY (p int)
+ |STORED AS parquet""".stripMargin)
+ for (part <- Range(0, 10)) {
+ sql(
+ s"""
+ |INSERT OVERWRITE TABLE test PARTITION (p='$part')
+ |select col from temp""".stripMargin)
+ }
+
+ val tablePath = new File(spark.sessionState.catalog.externalCatalog
+ .getTable("default", "test").location)
+ val tableSize = FileUtils.listFiles(tablePath, Array("parquet"), true).asScala
+ .map(_.length()).sum
+ assert(tableSize > 0)
+
+ // non-partitioned table
+ sql(
+ s"""
+ |CREATE TABLE test_non_part(i int)
+ |STORED AS parquet""".stripMargin)
+ sql(
+ s"""
+ |INSERT OVERWRITE TABLE test_non_part
+ |select col from temp""".stripMargin)
+ sql("ANALYZE TABLE test_non_part COMPUTE STATISTICS")
+
+ val nonPartTablePath = new File(spark.sessionState.catalog.externalCatalog
+ .getTable("default", "test_non_part").location)
+ val nonPartTableSize = FileUtils.listFiles(nonPartTablePath, Array("parquet"), true).asScala
+ .map(_.length()).sum
+ assert(nonPartTableSize > 0)
+
+ // check
+ withSQLConf("spark.sql.hive.convertMetastoreParquet" -> convertMetastoreParquet.toString) {
+ checkMaxFileSize(tableSize, nonPartTableSize)
+ }
+ }
+ }
+ }
+
+ test("watchdog with scan maxFileSize -- data source") {
+ withTempDir { dir =>
+ withTempView("test", "test_non_part") {
+ // partitioned table
+ val tablePath = new File(dir, "test")
+ spark.range(10).selectExpr("id", "id as p")
+ .write
+ .partitionBy("p")
+ .mode("overwrite")
+ .parquet(tablePath.getCanonicalPath)
+ spark.read.load(tablePath.getCanonicalPath).createOrReplaceTempView("test")
+
+ val tableSize = FileUtils.listFiles(tablePath, Array("parquet"), true).asScala
+ .map(_.length()).sum
+ assert(tableSize > 0)
+
+ // non-partitioned table
+ val nonPartTablePath = new File(dir, "test_non_part")
+ spark.range(10000).selectExpr("id", "id as p")
+ .write
+ .mode("overwrite")
+ .parquet(nonPartTablePath.getCanonicalPath)
+ spark.read.load(nonPartTablePath.getCanonicalPath).createOrReplaceTempView("test_non_part")
+
+ val nonPartTableSize = FileUtils.listFiles(nonPartTablePath, Array("parquet"), true).asScala
+ .map(_.length()).sum
+ assert(tableSize > 0)
+
+ // check
+ checkMaxFileSize(tableSize, nonPartTableSize)
+ }
+ }
+ }
}
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/ZorderSuiteBase.scala b/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/ZorderSuiteBase.scala
index b24533e6926..e0f86f85d84 100644
--- a/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/ZorderSuiteBase.scala
+++ b/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/ZorderSuiteBase.scala
@@ -18,9 +18,11 @@
package org.apache.spark.sql
import org.apache.spark.SparkConf
-import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.expressions.{Alias, Ascending, AttributeReference, Expression, ExpressionEvalHelper, Literal, NullsLast, SortOrder}
-import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, OneRowRelation, Project, Sort}
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, UnresolvedFunction, UnresolvedRelation, UnresolvedStar}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Ascending, AttributeReference, EqualTo, Expression, ExpressionEvalHelper, Literal, NullsLast, SortOrder}
+import org.apache.spark.sql.catalyst.parser.{ParseException, ParserInterface}
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, OneRowRelation, Project, Sort}
import org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand
import org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
import org.apache.spark.sql.functions._
@@ -29,7 +31,7 @@ import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
import org.apache.spark.sql.types._
import org.apache.kyuubi.sql.{KyuubiSQLConf, KyuubiSQLExtensionException}
-import org.apache.kyuubi.sql.zorder.{OptimizeZorderCommandBase, Zorder, ZorderBytesUtils}
+import org.apache.kyuubi.sql.zorder.{OptimizeZorderCommandBase, OptimizeZorderStatement, Zorder, ZorderBytesUtils}
trait ZorderSuiteBase extends KyuubiSparkSQLExtensionTest with ExpressionEvalHelper {
override def sparkConf(): SparkConf = {
@@ -245,20 +247,22 @@ trait ZorderSuiteBase extends KyuubiSparkSQLExtensionTest with ExpressionEvalHel
resHasSort: Boolean): Unit = {
def checkSort(plan: LogicalPlan): Unit = {
assert(plan.isInstanceOf[Sort] === resHasSort)
- if (plan.isInstanceOf[Sort]) {
- val colArr = cols.split(",")
- val refs =
- if (colArr.length == 1) {
- plan.asInstanceOf[Sort].order.head
- .child.asInstanceOf[AttributeReference] :: Nil
- } else {
- plan.asInstanceOf[Sort].order.head
- .child.asInstanceOf[Zorder].children.map(_.references.head)
+ plan match {
+ case sort: Sort =>
+ val colArr = cols.split(",")
+ val refs =
+ if (colArr.length == 1) {
+ sort.order.head
+ .child.asInstanceOf[AttributeReference] :: Nil
+ } else {
+ sort.order.head
+ .child.asInstanceOf[Zorder].children.map(_.references.head)
+ }
+ assert(refs.size === colArr.size)
+ refs.zip(colArr).foreach { case (ref, col) =>
+ assert(ref.name === col.trim)
}
- assert(refs.size === colArr.size)
- refs.zip(colArr).foreach { case (ref, col) =>
- assert(ref.name === col.trim)
- }
+ case _ =>
}
}
@@ -652,6 +656,99 @@ trait ZorderSuiteBase extends KyuubiSparkSQLExtensionTest with ExpressionEvalHel
ZorderBytesUtils.interleaveBitsDefault(inputs.map(ZorderBytesUtils.toByteArray).toArray)))
}
}
+
+ test("OPTIMIZE command is parsed as expected") {
+ val parser = createParser
+ val globalSort = spark.conf.get(KyuubiSQLConf.ZORDER_GLOBAL_SORT_ENABLED)
+
+ assert(parser.parsePlan("OPTIMIZE p zorder by c1") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(UnresolvedAttribute("c1"), Ascending, NullsLast, Seq.empty) :: Nil,
+ globalSort,
+ Project(Seq(UnresolvedStar(None)), UnresolvedRelation(TableIdentifier("p"))))))
+
+ assert(parser.parsePlan("OPTIMIZE p zorder by c1, c2") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(
+ Zorder(Seq(UnresolvedAttribute("c1"), UnresolvedAttribute("c2"))),
+ Ascending,
+ NullsLast,
+ Seq.empty) :: Nil,
+ globalSort,
+ Project(Seq(UnresolvedStar(None)), UnresolvedRelation(TableIdentifier("p"))))))
+
+ assert(parser.parsePlan("OPTIMIZE p where id = 1 zorder by c1") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(UnresolvedAttribute("c1"), Ascending, NullsLast, Seq.empty) :: Nil,
+ globalSort,
+ Project(
+ Seq(UnresolvedStar(None)),
+ Filter(
+ EqualTo(UnresolvedAttribute("id"), Literal(1)),
+ UnresolvedRelation(TableIdentifier("p")))))))
+
+ assert(parser.parsePlan("OPTIMIZE p where id = 1 zorder by c1, c2") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(
+ Zorder(Seq(UnresolvedAttribute("c1"), UnresolvedAttribute("c2"))),
+ Ascending,
+ NullsLast,
+ Seq.empty) :: Nil,
+ globalSort,
+ Project(
+ Seq(UnresolvedStar(None)),
+ Filter(
+ EqualTo(UnresolvedAttribute("id"), Literal(1)),
+ UnresolvedRelation(TableIdentifier("p")))))))
+
+ assert(parser.parsePlan("OPTIMIZE p where id = current_date() zorder by c1") ===
+ OptimizeZorderStatement(
+ Seq("p"),
+ Sort(
+ SortOrder(UnresolvedAttribute("c1"), Ascending, NullsLast, Seq.empty) :: Nil,
+ globalSort,
+ Project(
+ Seq(UnresolvedStar(None)),
+ Filter(
+ EqualTo(
+ UnresolvedAttribute("id"),
+ UnresolvedFunction("current_date", Seq.empty, false)),
+ UnresolvedRelation(TableIdentifier("p")))))))
+
+ // TODO: add following case support
+ intercept[ParseException] {
+ parser.parsePlan("OPTIMIZE p zorder by (c1)")
+ }
+
+ intercept[ParseException] {
+ parser.parsePlan("OPTIMIZE p zorder by (c1, c2)")
+ }
+ }
+
+ test("OPTIMIZE partition predicates constraint") {
+ withTable("p") {
+ sql("CREATE TABLE p (c1 INT, c2 INT) PARTITIONED BY (event_date DATE)")
+ val e1 = intercept[KyuubiSQLExtensionException] {
+ sql("OPTIMIZE p WHERE event_date = current_date as c ZORDER BY c1, c2")
+ }
+ assert(e1.getMessage.contains("unsupported partition predicates"))
+
+ val e2 = intercept[KyuubiSQLExtensionException] {
+ sql("OPTIMIZE p WHERE c1 = 1 ZORDER BY c1, c2")
+ }
+ assert(e2.getMessage == "Only partition column filters are allowed")
+ }
+ }
+
+ def createParser: ParserInterface
}
trait ZorderWithCodegenEnabledSuiteBase extends ZorderSuiteBase {
diff --git a/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/benchmark/KyuubiBenchmarkBase.scala b/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/benchmark/KyuubiBenchmarkBase.scala
index c8c1b021d5a..b891a7224a0 100644
--- a/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/benchmark/KyuubiBenchmarkBase.scala
+++ b/extensions/spark/kyuubi-extension-spark-common/src/test/scala/org/apache/spark/sql/benchmark/KyuubiBenchmarkBase.scala
@@ -22,6 +22,7 @@ import java.io.{File, FileOutputStream, OutputStream}
import scala.collection.JavaConverters._
import com.google.common.reflect.ClassPath
+import org.scalatest.Assertions._
trait KyuubiBenchmarkBase {
var output: Option[OutputStream] = None
diff --git a/extensions/spark/kyuubi-extension-spark-jdbc-dialect/pom.xml b/extensions/spark/kyuubi-extension-spark-jdbc-dialect/pom.xml
index 48c4c437923..ea571644e1d 100644
--- a/extensions/spark/kyuubi-extension-spark-jdbc-dialect/pom.xml
+++ b/extensions/spark/kyuubi-extension-spark-jdbc-dialect/pom.xml
@@ -21,12 +21,12 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-extension-spark-jdbc-dialect_2.12
+ kyuubi-extension-spark-jdbc-dialect_${scala.binary.version}jarKyuubi Spark JDBC Dialect pluginhttps://kyuubi.apache.org/
diff --git a/extensions/spark/kyuubi-spark-authz/README.md b/extensions/spark/kyuubi-spark-authz/README.md
index 554797ee01d..374f83b0379 100644
--- a/extensions/spark/kyuubi-spark-authz/README.md
+++ b/extensions/spark/kyuubi-spark-authz/README.md
@@ -26,7 +26,7 @@
## Build
```shell
-build/mvn clean package -pl :kyuubi-spark-authz_2.12 -Dspark.version=3.2.1 -Dranger.version=2.3.0
+build/mvn clean package -DskipTests -pl :kyuubi-spark-authz_2.12 -am -Dspark.version=3.2.1 -Dranger.version=2.4.0
```
### Supported Apache Spark Versions
@@ -34,7 +34,8 @@ build/mvn clean package -pl :kyuubi-spark-authz_2.12 -Dspark.version=3.2.1 -Dran
`-Dspark.version=`
- [x] master
-- [x] 3.3.x (default)
+- [x] 3.4.x (default)
+- [x] 3.3.x
- [x] 3.2.x
- [x] 3.1.x
- [x] 3.0.x
@@ -44,7 +45,8 @@ build/mvn clean package -pl :kyuubi-spark-authz_2.12 -Dspark.version=3.2.1 -Dran
`-Dranger.version=`
-- [x] 2.3.x (default)
+- [x] 2.4.x (default)
+- [x] 2.3.x
- [x] 2.2.x
- [x] 2.1.x
- [x] 2.0.x
@@ -52,5 +54,5 @@ build/mvn clean package -pl :kyuubi-spark-authz_2.12 -Dspark.version=3.2.1 -Dran
- [x] 1.1.x
- [x] 1.0.x
- [x] 0.7.x
-- [x] 0.6.x
+- [ ] 0.6.x
diff --git a/extensions/spark/kyuubi-spark-authz/pom.xml b/extensions/spark/kyuubi-spark-authz/pom.xml
index 8df1b9465a9..1ae63fcb34f 100644
--- a/extensions/spark/kyuubi-spark-authz/pom.xml
+++ b/extensions/spark/kyuubi-spark-authz/pom.xml
@@ -21,12 +21,12 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-spark-authz_2.12
+ kyuubi-spark-authz_${scala.binary.version}jarKyuubi Dev Spark Authorization Extensionhttps://kyuubi.apache.org/
@@ -39,6 +39,11 @@
+
+ org.apache.kyuubi
+ kyuubi-util-scala_${scala.binary.version}
+ ${project.version}
+ org.apache.rangerranger-plugins-common
@@ -321,7 +326,6 @@
-
${project.basedir}/src/test/resources
@@ -331,4 +335,31 @@
target/scala-${scala.binary.version}/test-classes
+
+
+ gen-policy
+
+
+
+ org.codehaus.mojo
+ build-helper-maven-plugin
+
+
+ add-test-source
+
+ add-test-source
+
+ generate-sources
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.FunctionExtractor b/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.FunctionExtractor
index 4686bb033cf..2facb004a04 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.FunctionExtractor
+++ b/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.FunctionExtractor
@@ -17,4 +17,5 @@
org.apache.kyuubi.plugin.spark.authz.serde.ExpressionInfoFunctionExtractor
org.apache.kyuubi.plugin.spark.authz.serde.FunctionIdentifierFunctionExtractor
+org.apache.kyuubi.plugin.spark.authz.serde.QualifiedNameStringFunctionExtractor
org.apache.kyuubi.plugin.spark.authz.serde.StringFunctionExtractor
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.FunctionTypeExtractor b/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.FunctionTypeExtractor
index 475f47afc24..3bb0ee6c23e 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.FunctionTypeExtractor
+++ b/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.FunctionTypeExtractor
@@ -17,4 +17,5 @@
org.apache.kyuubi.plugin.spark.authz.serde.ExpressionInfoFunctionTypeExtractor
org.apache.kyuubi.plugin.spark.authz.serde.FunctionIdentifierFunctionTypeExtractor
+org.apache.kyuubi.plugin.spark.authz.serde.FunctionNameFunctionTypeExtractor
org.apache.kyuubi.plugin.spark.authz.serde.TempMarkerFunctionTypeExtractor
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.TableExtractor b/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.TableExtractor
index f4d7eb503bd..78f836c65cd 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.TableExtractor
+++ b/extensions/spark/kyuubi-spark-authz/src/main/resources/META-INF/services/org.apache.kyuubi.plugin.spark.authz.serde.TableExtractor
@@ -18,8 +18,12 @@
org.apache.kyuubi.plugin.spark.authz.serde.CatalogTableOptionTableExtractor
org.apache.kyuubi.plugin.spark.authz.serde.CatalogTableTableExtractor
org.apache.kyuubi.plugin.spark.authz.serde.DataSourceV2RelationTableExtractor
+org.apache.kyuubi.plugin.spark.authz.serde.ExpressionSeqTableExtractor
org.apache.kyuubi.plugin.spark.authz.serde.IdentifierTableExtractor
org.apache.kyuubi.plugin.spark.authz.serde.LogicalRelationTableExtractor
org.apache.kyuubi.plugin.spark.authz.serde.ResolvedDbObjectNameTableExtractor
+org.apache.kyuubi.plugin.spark.authz.serde.ResolvedIdentifierTableExtractor
org.apache.kyuubi.plugin.spark.authz.serde.ResolvedTableTableExtractor
+org.apache.kyuubi.plugin.spark.authz.serde.StringTableExtractor
org.apache.kyuubi.plugin.spark.authz.serde.TableIdentifierTableExtractor
+org.apache.kyuubi.plugin.spark.authz.serde.TableTableExtractor
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/resources/database_command_spec.json b/extensions/spark/kyuubi-spark-authz/src/main/resources/database_command_spec.json
index 4eb4b3ef8c9..c640ed89bce 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/resources/database_command_spec.json
+++ b/extensions/spark/kyuubi-spark-authz/src/main/resources/database_command_spec.json
@@ -22,6 +22,11 @@
"fieldExtractor" : "CatalogPluginCatalogExtractor"
},
"isInput" : false
+ }, {
+ "fieldName" : "name",
+ "fieldExtractor" : "ResolvedNamespaceDatabaseExtractor",
+ "catalogDesc" : null,
+ "isInput" : false
} ],
"opType" : "CREATEDATABASE"
}, {
@@ -45,6 +50,11 @@
}, {
"classname" : "org.apache.spark.sql.catalyst.plans.logical.SetCatalogAndNamespace",
"databaseDescs" : [ {
+ "fieldName" : "child",
+ "fieldExtractor" : "ResolvedNamespaceDatabaseExtractor",
+ "catalogDesc" : null,
+ "isInput" : true
+ }, {
"fieldName" : "child",
"fieldExtractor" : "ResolvedDBObjectNameDatabaseExtractor",
"catalogDesc" : null,
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/resources/function_command_spec.json b/extensions/spark/kyuubi-spark-authz/src/main/resources/function_command_spec.json
index c9398561423..0b71245d218 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/resources/function_command_spec.json
+++ b/extensions/spark/kyuubi-spark-authz/src/main/resources/function_command_spec.json
@@ -1,6 +1,16 @@
[ {
"classname" : "org.apache.spark.sql.execution.command.CreateFunctionCommand",
"functionDescs" : [ {
+ "fieldName" : "identifier",
+ "fieldExtractor" : "FunctionIdentifierFunctionExtractor",
+ "databaseDesc" : null,
+ "functionTypeDesc" : {
+ "fieldName" : "isTemp",
+ "fieldExtractor" : "TempMarkerFunctionTypeExtractor",
+ "skipTypes" : [ "TEMP" ]
+ },
+ "isInput" : false
+ }, {
"fieldName" : "functionName",
"fieldExtractor" : "StringFunctionExtractor",
"databaseDesc" : {
@@ -44,6 +54,16 @@
}, {
"classname" : "org.apache.spark.sql.execution.command.DropFunctionCommand",
"functionDescs" : [ {
+ "fieldName" : "identifier",
+ "fieldExtractor" : "FunctionIdentifierFunctionExtractor",
+ "databaseDesc" : null,
+ "functionTypeDesc" : {
+ "fieldName" : "isTemp",
+ "fieldExtractor" : "TempMarkerFunctionTypeExtractor",
+ "skipTypes" : [ "TEMP" ]
+ },
+ "isInput" : false
+ }, {
"fieldName" : "functionName",
"fieldExtractor" : "StringFunctionExtractor",
"databaseDesc" : {
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/resources/scan_command_spec.json b/extensions/spark/kyuubi-spark-authz/src/main/resources/scan_command_spec.json
index 9a6aef4ed98..3273ccbeaf0 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/resources/scan_command_spec.json
+++ b/extensions/spark/kyuubi-spark-authz/src/main/resources/scan_command_spec.json
@@ -4,26 +4,86 @@
"fieldName" : "catalogTable",
"fieldExtractor" : "CatalogTableTableExtractor",
"catalogDesc" : null
- } ]
+ } ],
+ "functionDescs" : [ ]
}, {
"classname" : "org.apache.spark.sql.catalyst.catalog.HiveTableRelation",
"scanDescs" : [ {
"fieldName" : "tableMeta",
"fieldExtractor" : "CatalogTableTableExtractor",
"catalogDesc" : null
- } ]
+ } ],
+ "functionDescs" : [ ]
}, {
"classname" : "org.apache.spark.sql.execution.datasources.LogicalRelation",
"scanDescs" : [ {
"fieldName" : "catalogTable",
"fieldExtractor" : "CatalogTableOptionTableExtractor",
"catalogDesc" : null
- } ]
+ } ],
+ "functionDescs" : [ ]
}, {
"classname" : "org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation",
"scanDescs" : [ {
"fieldName" : null,
"fieldExtractor" : "DataSourceV2RelationTableExtractor",
"catalogDesc" : null
+ } ],
+ "functionDescs" : [ ]
+}, {
+ "classname" : "org.apache.spark.sql.hive.HiveGenericUDF",
+ "scanDescs" : [ ],
+ "functionDescs" : [ {
+ "fieldName" : "name",
+ "fieldExtractor" : "QualifiedNameStringFunctionExtractor",
+ "databaseDesc" : null,
+ "functionTypeDesc" : {
+ "fieldName" : "name",
+ "fieldExtractor" : "FunctionNameFunctionTypeExtractor",
+ "skipTypes" : [ "TEMP", "SYSTEM" ]
+ },
+ "isInput" : true
+ } ]
+}, {
+ "classname" : "org.apache.spark.sql.hive.HiveGenericUDTF",
+ "scanDescs" : [ ],
+ "functionDescs" : [ {
+ "fieldName" : "name",
+ "fieldExtractor" : "QualifiedNameStringFunctionExtractor",
+ "databaseDesc" : null,
+ "functionTypeDesc" : {
+ "fieldName" : "name",
+ "fieldExtractor" : "FunctionNameFunctionTypeExtractor",
+ "skipTypes" : [ "TEMP", "SYSTEM" ]
+ },
+ "isInput" : true
+ } ]
+}, {
+ "classname" : "org.apache.spark.sql.hive.HiveSimpleUDF",
+ "scanDescs" : [ ],
+ "functionDescs" : [ {
+ "fieldName" : "name",
+ "fieldExtractor" : "QualifiedNameStringFunctionExtractor",
+ "databaseDesc" : null,
+ "functionTypeDesc" : {
+ "fieldName" : "name",
+ "fieldExtractor" : "FunctionNameFunctionTypeExtractor",
+ "skipTypes" : [ "TEMP", "SYSTEM" ]
+ },
+ "isInput" : true
+ } ]
+}, {
+ "classname" : "org.apache.spark.sql.hive.HiveUDAFFunction",
+ "scanDescs" : [ ],
+ "functionDescs" : [ {
+ "fieldName" : "name",
+ "fieldExtractor" : "QualifiedNameStringFunctionExtractor",
+ "databaseDesc" : null,
+ "functionTypeDesc" : {
+ "fieldName" : "name",
+ "fieldExtractor" : "FunctionNameFunctionTypeExtractor",
+ "skipTypes" : [ "TEMP", "SYSTEM" ]
+ },
+ "isInput" : true
} ]
} ]
\ No newline at end of file
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/resources/table_command_spec.json b/extensions/spark/kyuubi-spark-authz/src/main/resources/table_command_spec.json
index f1c2297b38e..3e191146862 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/resources/table_command_spec.json
+++ b/extensions/spark/kyuubi-spark-authz/src/main/resources/table_command_spec.json
@@ -91,6 +91,20 @@
"fieldName" : "plan",
"fieldExtractor" : "LogicalPlanQueryExtractor"
} ]
+}, {
+ "classname" : "org.apache.spark.sql.catalyst.plans.logical.Call",
+ "tableDescs" : [ {
+ "fieldName" : "args",
+ "fieldExtractor" : "ExpressionSeqTableExtractor",
+ "columnDesc" : null,
+ "actionTypeDesc" : null,
+ "tableTypeDesc" : null,
+ "catalogDesc" : null,
+ "isInput" : false,
+ "setCurrentDatabaseIfMissing" : false
+ } ],
+ "opType" : "ALTERTABLE_PROPERTIES",
+ "queryDescs" : [ ]
}, {
"classname" : "org.apache.spark.sql.catalyst.plans.logical.CommentOnTable",
"tableDescs" : [ {
@@ -108,6 +122,15 @@
}, {
"classname" : "org.apache.spark.sql.catalyst.plans.logical.CreateTable",
"tableDescs" : [ {
+ "fieldName" : "child",
+ "fieldExtractor" : "ResolvedIdentifierTableExtractor",
+ "columnDesc" : null,
+ "actionTypeDesc" : null,
+ "tableTypeDesc" : null,
+ "catalogDesc" : null,
+ "isInput" : false,
+ "setCurrentDatabaseIfMissing" : false
+ }, {
"fieldName" : "tableName",
"fieldExtractor" : "IdentifierTableExtractor",
"columnDesc" : null,
@@ -134,6 +157,15 @@
}, {
"classname" : "org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect",
"tableDescs" : [ {
+ "fieldName" : "left",
+ "fieldExtractor" : "ResolvedIdentifierTableExtractor",
+ "columnDesc" : null,
+ "actionTypeDesc" : null,
+ "tableTypeDesc" : null,
+ "catalogDesc" : null,
+ "isInput" : false,
+ "setCurrentDatabaseIfMissing" : false
+ }, {
"fieldName" : "tableName",
"fieldExtractor" : "IdentifierTableExtractor",
"columnDesc" : null,
@@ -264,6 +296,15 @@
}, {
"classname" : "org.apache.spark.sql.catalyst.plans.logical.DropTable",
"tableDescs" : [ {
+ "fieldName" : "child",
+ "fieldExtractor" : "ResolvedIdentifierTableExtractor",
+ "columnDesc" : null,
+ "actionTypeDesc" : null,
+ "tableTypeDesc" : null,
+ "catalogDesc" : null,
+ "isInput" : false,
+ "setCurrentDatabaseIfMissing" : false
+ }, {
"fieldName" : "child",
"fieldExtractor" : "ResolvedTableTableExtractor",
"columnDesc" : null,
@@ -432,6 +473,15 @@
}, {
"classname" : "org.apache.spark.sql.catalyst.plans.logical.ReplaceTable",
"tableDescs" : [ {
+ "fieldName" : "child",
+ "fieldExtractor" : "ResolvedIdentifierTableExtractor",
+ "columnDesc" : null,
+ "actionTypeDesc" : null,
+ "tableTypeDesc" : null,
+ "catalogDesc" : null,
+ "isInput" : false,
+ "setCurrentDatabaseIfMissing" : false
+ }, {
"fieldName" : "tableName",
"fieldExtractor" : "IdentifierTableExtractor",
"columnDesc" : null,
@@ -458,6 +508,15 @@
}, {
"classname" : "org.apache.spark.sql.catalyst.plans.logical.ReplaceTableAsSelect",
"tableDescs" : [ {
+ "fieldName" : "left",
+ "fieldExtractor" : "ResolvedIdentifierTableExtractor",
+ "columnDesc" : null,
+ "actionTypeDesc" : null,
+ "tableTypeDesc" : null,
+ "catalogDesc" : null,
+ "isInput" : false,
+ "setCurrentDatabaseIfMissing" : false
+ }, {
"fieldName" : "tableName",
"fieldExtractor" : "IdentifierTableExtractor",
"columnDesc" : null,
@@ -806,6 +865,15 @@
}, {
"classname" : "org.apache.spark.sql.execution.command.AnalyzeColumnCommand",
"tableDescs" : [ {
+ "fieldName" : "tableIdent",
+ "fieldExtractor" : "TableIdentifierTableExtractor",
+ "columnDesc" : null,
+ "actionTypeDesc" : null,
+ "tableTypeDesc" : null,
+ "catalogDesc" : null,
+ "isInput" : false,
+ "setCurrentDatabaseIfMissing" : false
+ }, {
"fieldName" : "tableIdent",
"fieldExtractor" : "TableIdentifierTableExtractor",
"columnDesc" : {
@@ -830,11 +898,20 @@
"isInput" : true,
"setCurrentDatabaseIfMissing" : false
} ],
- "opType" : "ANALYZE_TABLE",
+ "opType" : "ALTERTABLE_PROPERTIES",
"queryDescs" : [ ]
}, {
"classname" : "org.apache.spark.sql.execution.command.AnalyzePartitionCommand",
"tableDescs" : [ {
+ "fieldName" : "tableIdent",
+ "fieldExtractor" : "TableIdentifierTableExtractor",
+ "columnDesc" : null,
+ "actionTypeDesc" : null,
+ "tableTypeDesc" : null,
+ "catalogDesc" : null,
+ "isInput" : false,
+ "setCurrentDatabaseIfMissing" : false
+ }, {
"fieldName" : "tableIdent",
"fieldExtractor" : "TableIdentifierTableExtractor",
"columnDesc" : {
@@ -847,7 +924,7 @@
"isInput" : true,
"setCurrentDatabaseIfMissing" : false
} ],
- "opType" : "ANALYZE_TABLE",
+ "opType" : "ALTERTABLE_PROPERTIES",
"queryDescs" : [ ]
}, {
"classname" : "org.apache.spark.sql.execution.command.AnalyzeTableCommand",
@@ -858,14 +935,9 @@
"actionTypeDesc" : null,
"tableTypeDesc" : null,
"catalogDesc" : null,
- "isInput" : true,
+ "isInput" : false,
"setCurrentDatabaseIfMissing" : false
- } ],
- "opType" : "ANALYZE_TABLE",
- "queryDescs" : [ ]
-}, {
- "classname" : "org.apache.spark.sql.execution.command.AnalyzeTablesCommand",
- "tableDescs" : [ {
+ }, {
"fieldName" : "tableIdent",
"fieldExtractor" : "TableIdentifierTableExtractor",
"columnDesc" : null,
@@ -875,7 +947,7 @@
"isInput" : true,
"setCurrentDatabaseIfMissing" : false
} ],
- "opType" : "ANALYZE_TABLE",
+ "opType" : "ALTERTABLE_PROPERTIES",
"queryDescs" : [ ]
}, {
"classname" : "org.apache.spark.sql.execution.command.CacheTableCommand",
@@ -1243,14 +1315,6 @@
"fieldName" : "query",
"fieldExtractor" : "LogicalPlanQueryExtractor"
} ]
-}, {
- "classname" : "org.apache.spark.sql.execution.datasources.InsertIntoHiveDirCommand",
- "tableDescs" : [ ],
- "opType" : "QUERY",
- "queryDescs" : [ {
- "fieldName" : "query",
- "fieldExtractor" : "LogicalPlanQueryExtractor"
- } ]
}, {
"classname" : "org.apache.spark.sql.execution.datasources.RefreshTable",
"tableDescs" : [ {
@@ -1293,6 +1357,14 @@
"fieldName" : "query",
"fieldExtractor" : "LogicalPlanQueryExtractor"
} ]
+}, {
+ "classname" : "org.apache.spark.sql.hive.execution.InsertIntoHiveDirCommand",
+ "tableDescs" : [ ],
+ "opType" : "QUERY",
+ "queryDescs" : [ {
+ "fieldName" : "query",
+ "fieldExtractor" : "LogicalPlanQueryExtractor"
+ } ]
}, {
"classname" : "org.apache.spark.sql.hive.execution.InsertIntoHiveTable",
"tableDescs" : [ {
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilder.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilder.scala
index b8220ea2732..5c496b8744b 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilder.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilder.scala
@@ -28,6 +28,7 @@ import org.apache.kyuubi.plugin.spark.authz.OperationType.OperationType
import org.apache.kyuubi.plugin.spark.authz.PrivilegeObjectActionType._
import org.apache.kyuubi.plugin.spark.authz.serde._
import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
object PrivilegesBuilder {
@@ -208,7 +209,39 @@ object PrivilegesBuilder {
}
}
- type PrivilegesAndOpType = (Seq[PrivilegeObject], Seq[PrivilegeObject], OperationType)
+ type PrivilegesAndOpType = (Iterable[PrivilegeObject], Iterable[PrivilegeObject], OperationType)
+
+ /**
+ * Build input privilege objects from a Spark's LogicalPlan for hive permanent udf
+ *
+ * @param plan A Spark LogicalPlan
+ */
+ def buildFunctions(
+ plan: LogicalPlan,
+ spark: SparkSession): PrivilegesAndOpType = {
+ val inputObjs = new ArrayBuffer[PrivilegeObject]
+ plan match {
+ case command: Command if isKnownTableCommand(command) =>
+ val spec = getTableCommandSpec(command)
+ val functionPrivAndOpType = spec.queries(plan)
+ .map(plan => buildFunctions(plan, spark))
+ functionPrivAndOpType.map(_._1)
+ .reduce(_ ++ _)
+ .foreach(functionPriv => inputObjs += functionPriv)
+
+ case plan => plan transformAllExpressions {
+ case hiveFunction: Expression if isKnownFunction(hiveFunction) =>
+ val functionSpec: ScanSpec = getFunctionSpec(hiveFunction)
+ if (functionSpec.functionDescs
+ .exists(!_.functionTypeDesc.get.skip(hiveFunction, spark))) {
+ functionSpec.functions(hiveFunction).foreach(func =>
+ inputObjs += PrivilegeObject(func))
+ }
+ hiveFunction
+ }
+ }
+ (inputObjs, Seq.empty, OperationType.QUERY)
+ }
/**
* Build input and output privilege objects from a Spark's LogicalPlan
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/AccessRequest.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/AccessRequest.scala
index 4997dda3b87..8fc8028e683 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/AccessRequest.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/AccessRequest.scala
@@ -27,7 +27,7 @@ import org.apache.ranger.plugin.policyengine.{RangerAccessRequestImpl, RangerPol
import org.apache.kyuubi.plugin.spark.authz.OperationType.OperationType
import org.apache.kyuubi.plugin.spark.authz.ranger.AccessType._
-import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils.{invoke, invokeAs}
+import org.apache.kyuubi.util.reflect.ReflectUtils._
case class AccessRequest private (accessType: AccessType) extends RangerAccessRequestImpl
@@ -50,7 +50,7 @@ object AccessRequest {
"getRolesFromUserAndGroups",
(classOf[String], userName),
(classOf[JSet[String]], userGroups))
- invoke(req, "setUserRoles", (classOf[JSet[String]], roles))
+ invokeAs[Unit](req, "setUserRoles", (classOf[JSet[String]], roles))
} catch {
case _: Exception =>
}
@@ -61,7 +61,7 @@ object AccessRequest {
}
try {
val clusterName = invokeAs[String](SparkRangerAdminPlugin, "getClusterName")
- invoke(req, "setClusterName", (classOf[String], clusterName))
+ invokeAs[Unit](req, "setClusterName", (classOf[String], clusterName))
} catch {
case _: Exception =>
}
@@ -74,8 +74,8 @@ object AccessRequest {
private def getUserGroupsFromUserStore(user: UserGroupInformation): Option[JSet[String]] = {
try {
- val storeEnricher = invoke(SparkRangerAdminPlugin, "getUserStoreEnricher")
- val userStore = invoke(storeEnricher, "getRangerUserStore")
+ val storeEnricher = invokeAs[AnyRef](SparkRangerAdminPlugin, "getUserStoreEnricher")
+ val userStore = invokeAs[AnyRef](storeEnricher, "getRangerUserStore")
val userGroupMapping =
invokeAs[JHashMap[String, JSet[String]]](userStore, "getUserGroupMapping")
Some(userGroupMapping.get(user.getShortUserName))
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/AccessType.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/AccessType.scala
index 7d62229ee41..c0b7d2a03ef 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/AccessType.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/AccessType.scala
@@ -58,7 +58,12 @@ object AccessType extends Enumeration {
SHOWPARTITIONS |
ANALYZE_TABLE => SELECT
case SHOWCOLUMNS | DESCTABLE => SELECT
- case SHOWDATABASES | SWITCHDATABASE | DESCDATABASE | SHOWTABLES | SHOWFUNCTIONS => USE
+ case SHOWDATABASES |
+ SWITCHDATABASE |
+ DESCDATABASE |
+ SHOWTABLES |
+ SHOWFUNCTIONS |
+ DESCFUNCTION => USE
case TRUNCATETABLE => UPDATE
case _ => NONE
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/FilterDataSourceV2Strategy.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/FilterDataSourceV2Strategy.scala
index d39aacdcf91..cbf79581ed6 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/FilterDataSourceV2Strategy.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/FilterDataSourceV2Strategy.scala
@@ -17,13 +17,20 @@
package org.apache.kyuubi.plugin.spark.authz.ranger
import org.apache.spark.sql.{SparkSession, Strategy}
-import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project}
import org.apache.spark.sql.execution.SparkPlan
import org.apache.kyuubi.plugin.spark.authz.util.ObjectFilterPlaceHolder
class FilterDataSourceV2Strategy(spark: SparkSession) extends Strategy {
override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+ // For Spark 3.1 and below, `ColumnPruning` rule will set `ObjectFilterPlaceHolder#child` to
+ // `Project`
+ case ObjectFilterPlaceHolder(Project(_, child)) if child.nodeName == "ShowNamespaces" =>
+ spark.sessionState.planner.plan(child)
+ .map(FilteredShowNamespaceExec(_, spark.sparkContext)).toSeq
+
+ // For Spark 3.2 and above
case ObjectFilterPlaceHolder(child) if child.nodeName == "ShowNamespaces" =>
spark.sessionState.planner.plan(child)
.map(FilteredShowNamespaceExec(_, spark.sparkContext)).toSeq
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleAuthorization.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleAuthorization.scala
index 3d53174f3e6..3203108dfae 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleAuthorization.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleAuthorization.scala
@@ -53,7 +53,7 @@ object RuleAuthorization {
requests += AccessRequest(resource, ugi, opType, AccessType.USE)
}
- def addAccessRequest(objects: Seq[PrivilegeObject], isInput: Boolean): Unit = {
+ def addAccessRequest(objects: Iterable[PrivilegeObject], isInput: Boolean): Unit = {
objects.foreach { obj =>
val resource = AccessResource(obj, opType)
val accessType = ranger.AccessType(obj, opType, isInput)
@@ -84,7 +84,7 @@ object RuleAuthorization {
}
case _ => Seq(request)
}
- }
+ }.toSeq
if (authorizeInSingleCall) {
verify(requestArrays.flatten, auditHandler)
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleReplaceShowObjectCommands.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleReplaceShowObjectCommands.scala
index 08d2b4fd024..bf762109cba 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleReplaceShowObjectCommands.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleReplaceShowObjectCommands.scala
@@ -26,15 +26,13 @@ import org.apache.spark.sql.execution.command.{RunnableCommand, ShowColumnsComma
import org.apache.kyuubi.plugin.spark.authz.{ObjectType, OperationType}
import org.apache.kyuubi.plugin.spark.authz.util.{AuthZUtils, ObjectFilterPlaceHolder, WithInternalChildren}
+import org.apache.kyuubi.util.reflect.ReflectUtils._
class RuleReplaceShowObjectCommands extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = plan match {
case r: RunnableCommand if r.nodeName == "ShowTablesCommand" => FilteredShowTablesCommand(r)
case n: LogicalPlan if n.nodeName == "ShowTables" =>
ObjectFilterPlaceHolder(n)
- // show databases in spark2.4.x
- case r: RunnableCommand if r.nodeName == "ShowDatabasesCommand" =>
- FilteredShowDatabasesCommand(r)
case n: LogicalPlan if n.nodeName == "ShowNamespaces" =>
ObjectFilterPlaceHolder(n)
case r: RunnableCommand if r.nodeName == "ShowFunctionsCommand" =>
@@ -48,7 +46,7 @@ class RuleReplaceShowObjectCommands extends Rule[LogicalPlan] {
case class FilteredShowTablesCommand(delegated: RunnableCommand)
extends FilteredShowObjectCommand(delegated) {
- var isExtended: Boolean = AuthZUtils.getFieldVal(delegated, "isExtended").asInstanceOf[Boolean]
+ private val isExtended = getField[Boolean](delegated, "isExtended")
override protected def isAllowed(r: Row, ugi: UserGroupInformation): Boolean = {
val database = r.getString(0)
@@ -63,18 +61,6 @@ case class FilteredShowTablesCommand(delegated: RunnableCommand)
}
}
-case class FilteredShowDatabasesCommand(delegated: RunnableCommand)
- extends FilteredShowObjectCommand(delegated) {
-
- override protected def isAllowed(r: Row, ugi: UserGroupInformation): Boolean = {
- val database = r.getString(0)
- val resource = AccessResource(ObjectType.DATABASE, database, null, null)
- val request = AccessRequest(resource, ugi, OperationType.SHOWDATABASES, AccessType.USE)
- val result = SparkRangerAdminPlugin.isAccessAllowed(request)
- result != null && result.getIsAllowed
- }
-}
-
abstract class FilteredShowObjectCommand(delegated: RunnableCommand)
extends RunnableCommand with WithInternalChildren {
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/SparkRangerAdminPlugin.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/SparkRangerAdminPlugin.scala
index 78e59ff897f..9abb9cd2805 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/SparkRangerAdminPlugin.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/SparkRangerAdminPlugin.scala
@@ -79,7 +79,7 @@ object SparkRangerAdminPlugin extends RangerBasePlugin("spark", "sparkSql")
() => {
if (plugin != null) {
LOG.info(s"clean up ranger plugin, appId: ${plugin.getAppId}")
- this.cleanup()
+ plugin.cleanup()
}
},
Integer.MAX_VALUE)
@@ -109,7 +109,7 @@ object SparkRangerAdminPlugin extends RangerBasePlugin("spark", "sparkSql")
} else if (result.getMaskTypeDef != null) {
result.getMaskTypeDef.getName match {
case "MASK" => regexp_replace(col)
- case "MASK_SHOW_FIRST_4" if isSparkVersionAtLeast("3.1") =>
+ case "MASK_SHOW_FIRST_4" if isSparkV31OrGreater =>
regexp_replace(col, hasLen = true)
case "MASK_SHOW_FIRST_4" =>
val right = regexp_replace(s"substr($col, 5)")
@@ -136,7 +136,8 @@ object SparkRangerAdminPlugin extends RangerBasePlugin("spark", "sparkSql")
val upper = s"regexp_replace($expr, '[A-Z]', 'X'$pos)"
val lower = s"regexp_replace($upper, '[a-z]', 'x'$pos)"
val digits = s"regexp_replace($lower, '[0-9]', 'n'$pos)"
- digits
+ val other = s"regexp_replace($digits, '[^A-Za-z0-9]', 'U'$pos)"
+ other
}
/**
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/CommandSpec.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/CommandSpec.scala
index e96ef8cbfd6..32ad30e211f 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/CommandSpec.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/CommandSpec.scala
@@ -19,6 +19,7 @@ package org.apache.kyuubi.plugin.spark.authz.serde
import com.fasterxml.jackson.annotation.JsonIgnore
import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.slf4j.LoggerFactory
@@ -94,7 +95,8 @@ case class TableCommandSpec(
case class ScanSpec(
classname: String,
- scanDescs: Seq[ScanDesc]) extends CommandSpec {
+ scanDescs: Seq[ScanDesc],
+ functionDescs: Seq[FunctionDesc] = Seq.empty) extends CommandSpec {
override def opType: String = OperationType.QUERY.toString
def tables: (LogicalPlan, SparkSession) => Seq[Table] = (plan, spark) => {
scanDescs.flatMap { td =>
@@ -107,4 +109,16 @@ case class ScanSpec(
}
}
}
+
+ def functions: (Expression) => Seq[Function] = (expr) => {
+ functionDescs.flatMap { fd =>
+ try {
+ Some(fd.extract(expr))
+ } catch {
+ case e: Exception =>
+ LOG.debug(fd.error(expr, e))
+ None
+ }
+ }
+ }
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/Descriptor.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/Descriptor.scala
index d8c866b8875..fc660ce143e 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/Descriptor.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/Descriptor.scala
@@ -23,18 +23,9 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.kyuubi.plugin.spark.authz.PrivilegeObjectActionType
import org.apache.kyuubi.plugin.spark.authz.PrivilegeObjectActionType.PrivilegeObjectActionType
-import org.apache.kyuubi.plugin.spark.authz.serde.ActionTypeExtractor.actionTypeExtractors
-import org.apache.kyuubi.plugin.spark.authz.serde.CatalogExtractor.catalogExtractors
-import org.apache.kyuubi.plugin.spark.authz.serde.ColumnExtractor.columnExtractors
-import org.apache.kyuubi.plugin.spark.authz.serde.DatabaseExtractor.dbExtractors
-import org.apache.kyuubi.plugin.spark.authz.serde.FunctionExtractor.functionExtractors
import org.apache.kyuubi.plugin.spark.authz.serde.FunctionType.FunctionType
-import org.apache.kyuubi.plugin.spark.authz.serde.FunctionTypeExtractor.functionTypeExtractors
-import org.apache.kyuubi.plugin.spark.authz.serde.QueryExtractor.queryExtractors
-import org.apache.kyuubi.plugin.spark.authz.serde.TableExtractor.tableExtractors
import org.apache.kyuubi.plugin.spark.authz.serde.TableType.TableType
-import org.apache.kyuubi.plugin.spark.authz.serde.TableTypeExtractor.tableTypeExtractors
-import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
/**
* A database object(such as database, table, function) descriptor describes its name and getter
@@ -81,8 +72,8 @@ case class ColumnDesc(
fieldName: String,
fieldExtractor: String) extends Descriptor {
override def extract(v: AnyRef): Seq[String] = {
- val columnsVal = invoke(v, fieldName)
- val columnExtractor = columnExtractors(fieldExtractor)
+ val columnsVal = invokeAs[AnyRef](v, fieldName)
+ val columnExtractor = lookupExtractor[ColumnExtractor](fieldExtractor)
columnExtractor(columnsVal)
}
}
@@ -100,8 +91,8 @@ case class DatabaseDesc(
catalogDesc: Option[CatalogDesc] = None,
isInput: Boolean = false) extends Descriptor {
override def extract(v: AnyRef): Database = {
- val databaseVal = invoke(v, fieldName)
- val databaseExtractor = dbExtractors(fieldExtractor)
+ val databaseVal = invokeAs[AnyRef](v, fieldName)
+ val databaseExtractor = lookupExtractor[DatabaseExtractor](fieldExtractor)
val db = databaseExtractor(databaseVal)
if (db.catalog.isEmpty && catalogDesc.nonEmpty) {
val maybeCatalog = catalogDesc.get.extract(v)
@@ -128,8 +119,8 @@ case class FunctionTypeDesc(
}
def extract(v: AnyRef, spark: SparkSession): FunctionType = {
- val functionTypeVal = invoke(v, fieldName)
- val functionTypeExtractor = functionTypeExtractors(fieldExtractor)
+ val functionTypeVal = invokeAs[AnyRef](v, fieldName)
+ val functionTypeExtractor = lookupExtractor[FunctionTypeExtractor](fieldExtractor)
functionTypeExtractor(functionTypeVal, spark)
}
@@ -154,8 +145,8 @@ case class FunctionDesc(
functionTypeDesc: Option[FunctionTypeDesc] = None,
isInput: Boolean = false) extends Descriptor {
override def extract(v: AnyRef): Function = {
- val functionVal = invoke(v, fieldName)
- val functionExtractor = functionExtractors(fieldExtractor)
+ val functionVal = invokeAs[AnyRef](v, fieldName)
+ val functionExtractor = lookupExtractor[FunctionExtractor](fieldExtractor)
var function = functionExtractor(functionVal)
if (function.database.isEmpty) {
val maybeDatabase = databaseDesc.map(_.extract(v))
@@ -179,8 +170,8 @@ case class QueryDesc(
fieldName: String,
fieldExtractor: String = "LogicalPlanQueryExtractor") extends Descriptor {
override def extract(v: AnyRef): Option[LogicalPlan] = {
- val queryVal = invoke(v, fieldName)
- val queryExtractor = queryExtractors(fieldExtractor)
+ val queryVal = invokeAs[AnyRef](v, fieldName)
+ val queryExtractor = lookupExtractor[QueryExtractor](fieldExtractor)
queryExtractor(queryVal)
}
}
@@ -201,8 +192,8 @@ case class TableTypeDesc(
}
def extract(v: AnyRef, spark: SparkSession): TableType = {
- val tableTypeVal = invoke(v, fieldName)
- val tableTypeExtractor = tableTypeExtractors(fieldExtractor)
+ val tableTypeVal = invokeAs[AnyRef](v, fieldName)
+ val tableTypeExtractor = lookupExtractor[TableTypeExtractor](fieldExtractor)
tableTypeExtractor(tableTypeVal, spark)
}
@@ -239,8 +230,8 @@ case class TableDesc(
}
def extract(v: AnyRef, spark: SparkSession): Option[Table] = {
- val tableVal = invoke(v, fieldName)
- val tableExtractor = tableExtractors(fieldExtractor)
+ val tableVal = invokeAs[AnyRef](v, fieldName)
+ val tableExtractor = lookupExtractor[TableExtractor](fieldExtractor)
val maybeTable = tableExtractor(spark, tableVal)
maybeTable.map { t =>
if (t.catalog.isEmpty && catalogDesc.nonEmpty) {
@@ -266,9 +257,9 @@ case class ActionTypeDesc(
actionType: Option[String] = None) extends Descriptor {
override def extract(v: AnyRef): PrivilegeObjectActionType = {
actionType.map(PrivilegeObjectActionType.withName).getOrElse {
- val actionTypeVal = invoke(v, fieldName)
- val extractor = actionTypeExtractors(fieldExtractor)
- extractor(actionTypeVal)
+ val actionTypeVal = invokeAs[AnyRef](v, fieldName)
+ val actionTypeExtractor = lookupExtractor[ActionTypeExtractor](fieldExtractor)
+ actionTypeExtractor(actionTypeVal)
}
}
}
@@ -283,9 +274,9 @@ case class CatalogDesc(
fieldName: String = "catalog",
fieldExtractor: String = "CatalogPluginCatalogExtractor") extends Descriptor {
override def extract(v: AnyRef): Option[String] = {
- val catalogVal = invoke(v, fieldName)
- val extractor = catalogExtractors(fieldExtractor)
- extractor(catalogVal)
+ val catalogVal = invokeAs[AnyRef](v, fieldName)
+ val catalogExtractor = lookupExtractor[CatalogExtractor](fieldExtractor)
+ catalogExtractor(catalogVal)
}
}
@@ -301,9 +292,9 @@ case class ScanDesc(
val tableVal = if (fieldName == null) {
v
} else {
- invoke(v, fieldName)
+ invokeAs[AnyRef](v, fieldName)
}
- val tableExtractor = tableExtractors(fieldExtractor)
+ val tableExtractor = lookupExtractor[TableExtractor](fieldExtractor)
val maybeTable = tableExtractor(spark, tableVal)
maybeTable.map { t =>
if (t.catalog.isEmpty && catalogDesc.nonEmpty) {
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/Function.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/Function.scala
index b7a0010b4b5..ba19972ed5f 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/Function.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/Function.scala
@@ -21,8 +21,8 @@ package org.apache.kyuubi.plugin.spark.authz.serde
* :: Developer API ::
*
* Represents a function identity
- *
+ * @param catalog
* @param database
* @param functionName
*/
-case class Function(database: Option[String], functionName: String)
+case class Function(catalog: Option[String], database: Option[String], functionName: String)
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/catalogExtractors.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/catalogExtractors.scala
index 0b7d712230e..e48becb325f 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/catalogExtractors.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/catalogExtractors.scala
@@ -17,7 +17,7 @@
package org.apache.kyuubi.plugin.spark.authz.serde
-import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
trait CatalogExtractor extends (AnyRef => Option[String]) with Extractor
@@ -43,7 +43,7 @@ class CatalogPluginOptionCatalogExtractor extends CatalogExtractor {
override def apply(v1: AnyRef): Option[String] = {
v1 match {
case Some(catalogPlugin: AnyRef) =>
- new CatalogPluginCatalogExtractor().apply(catalogPlugin)
+ lookupExtractor[CatalogPluginCatalogExtractor].apply(catalogPlugin)
case _ => None
}
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/databaseExtractors.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/databaseExtractors.scala
index 4e9270e7838..713d3e3fb75 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/databaseExtractors.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/databaseExtractors.scala
@@ -18,6 +18,7 @@
package org.apache.kyuubi.plugin.spark.authz.serde
import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
trait DatabaseExtractor extends (AnyRef => Database) with Extractor
@@ -68,9 +69,9 @@ class StringSeqOptionDatabaseExtractor extends DatabaseExtractor {
*/
class ResolvedNamespaceDatabaseExtractor extends DatabaseExtractor {
override def apply(v1: AnyRef): Database = {
- val catalogVal = invoke(v1, "catalog")
- val catalog = new CatalogPluginCatalogExtractor().apply(catalogVal)
- val namespace = getFieldVal[Seq[String]](v1, "namespace")
+ val catalogVal = invokeAs[AnyRef](v1, "catalog")
+ val catalog = lookupExtractor[CatalogPluginCatalogExtractor].apply(catalogVal)
+ val namespace = getField[Seq[String]](v1, "namespace")
Database(catalog, quote(namespace))
}
}
@@ -80,9 +81,9 @@ class ResolvedNamespaceDatabaseExtractor extends DatabaseExtractor {
*/
class ResolvedDBObjectNameDatabaseExtractor extends DatabaseExtractor {
override def apply(v1: AnyRef): Database = {
- val catalogVal = invoke(v1, "catalog")
- val catalog = new CatalogPluginCatalogExtractor().apply(catalogVal)
- val namespace = getFieldVal[Seq[String]](v1, "nameParts")
+ val catalogVal = invokeAs[AnyRef](v1, "catalog")
+ val catalog = lookupExtractor[CatalogPluginCatalogExtractor].apply(catalogVal)
+ val namespace = getField[Seq[String]](v1, "nameParts")
Database(catalog, quote(namespace))
}
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/functionExtractors.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/functionExtractors.scala
index 894a6cb8f2f..bcd5f266573 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/functionExtractors.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/functionExtractors.scala
@@ -20,12 +20,26 @@ package org.apache.kyuubi.plugin.spark.authz.serde
import org.apache.spark.sql.catalyst.FunctionIdentifier
import org.apache.spark.sql.catalyst.expressions.ExpressionInfo
+import org.apache.kyuubi.plugin.spark.authz.serde.FunctionExtractor.buildFunctionFromQualifiedName
+
trait FunctionExtractor extends (AnyRef => Function) with Extractor
object FunctionExtractor {
val functionExtractors: Map[String, FunctionExtractor] = {
loadExtractorsToMap[FunctionExtractor]
}
+
+ private[authz] def buildFunctionFromQualifiedName(qualifiedName: String): Function = {
+ val parts: Array[String] = qualifiedName.split("\\.")
+ val (catalog, database, functionName) = if (parts.length == 3) {
+ (Some(parts.head), Some(parts.tail.head), parts.last)
+ } else if (parts.length == 2) {
+ (None, Some(parts.head), parts.last)
+ } else {
+ (None, None, qualifiedName)
+ }
+ Function(catalog, database, functionName)
+ }
}
/**
@@ -33,7 +47,17 @@ object FunctionExtractor {
*/
class StringFunctionExtractor extends FunctionExtractor {
override def apply(v1: AnyRef): Function = {
- Function(None, v1.asInstanceOf[String])
+ Function(None, None, v1.asInstanceOf[String])
+ }
+}
+
+/**
+ * * String
+ */
+class QualifiedNameStringFunctionExtractor extends FunctionExtractor {
+ override def apply(v1: AnyRef): Function = {
+ val qualifiedName: String = v1.asInstanceOf[String]
+ buildFunctionFromQualifiedName(qualifiedName)
}
}
@@ -43,7 +67,7 @@ class StringFunctionExtractor extends FunctionExtractor {
class FunctionIdentifierFunctionExtractor extends FunctionExtractor {
override def apply(v1: AnyRef): Function = {
val identifier = v1.asInstanceOf[FunctionIdentifier]
- Function(identifier.database, identifier.funcName)
+ Function(None, identifier.database, identifier.funcName)
}
}
@@ -53,6 +77,6 @@ class FunctionIdentifierFunctionExtractor extends FunctionExtractor {
class ExpressionInfoFunctionExtractor extends FunctionExtractor {
override def apply(v1: AnyRef): Function = {
val info = v1.asInstanceOf[ExpressionInfo]
- Function(Option(info.getDb), info.getName)
+ Function(None, Option(info.getDb), info.getName)
}
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/functionTypeExtractors.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/functionTypeExtractors.scala
index 4c5e9dc8452..c134b501815 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/functionTypeExtractors.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/functionTypeExtractors.scala
@@ -19,8 +19,11 @@ package org.apache.kyuubi.plugin.spark.authz.serde
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.FunctionIdentifier
+import org.apache.spark.sql.catalyst.catalog.SessionCatalog
+import org.apache.kyuubi.plugin.spark.authz.serde.FunctionExtractor.buildFunctionFromQualifiedName
import org.apache.kyuubi.plugin.spark.authz.serde.FunctionType.{FunctionType, PERMANENT, SYSTEM, TEMP}
+import org.apache.kyuubi.plugin.spark.authz.serde.FunctionTypeExtractor.getFunctionType
object FunctionType extends Enumeration {
type FunctionType = Value
@@ -33,6 +36,19 @@ object FunctionTypeExtractor {
val functionTypeExtractors: Map[String, FunctionTypeExtractor] = {
loadExtractorsToMap[FunctionTypeExtractor]
}
+
+ def getFunctionType(fi: FunctionIdentifier, catalog: SessionCatalog): FunctionType = {
+ fi match {
+ case temp if catalog.isTemporaryFunction(temp) =>
+ TEMP
+ case permanent if catalog.isPersistentFunction(permanent) =>
+ PERMANENT
+ case system if catalog.isRegisteredFunction(system) =>
+ SYSTEM
+ case _ =>
+ TEMP
+ }
+ }
}
/**
@@ -53,9 +69,9 @@ class TempMarkerFunctionTypeExtractor extends FunctionTypeExtractor {
*/
class ExpressionInfoFunctionTypeExtractor extends FunctionTypeExtractor {
override def apply(v1: AnyRef, spark: SparkSession): FunctionType = {
- val function = new ExpressionInfoFunctionExtractor().apply(v1)
+ val function = lookupExtractor[ExpressionInfoFunctionExtractor].apply(v1)
val fi = FunctionIdentifier(function.functionName, function.database)
- new FunctionIdentifierFunctionTypeExtractor().apply(fi, spark)
+ lookupExtractor[FunctionIdentifierFunctionTypeExtractor].apply(fi, spark)
}
}
@@ -66,14 +82,18 @@ class FunctionIdentifierFunctionTypeExtractor extends FunctionTypeExtractor {
override def apply(v1: AnyRef, spark: SparkSession): FunctionType = {
val catalog = spark.sessionState.catalog
val fi = v1.asInstanceOf[FunctionIdentifier]
- if (catalog.isTemporaryFunction(fi)) {
- TEMP
- } else if (catalog.isPersistentFunction(fi)) {
- PERMANENT
- } else if (catalog.isRegisteredFunction(fi)) {
- SYSTEM
- } else {
- TEMP
- }
+ getFunctionType(fi, catalog)
+ }
+}
+
+/**
+ * String
+ */
+class FunctionNameFunctionTypeExtractor extends FunctionTypeExtractor {
+ override def apply(v1: AnyRef, spark: SparkSession): FunctionType = {
+ val catalog: SessionCatalog = spark.sessionState.catalog
+ val qualifiedName: String = v1.asInstanceOf[String]
+ val function = buildFunctionFromQualifiedName(qualifiedName)
+ getFunctionType(FunctionIdentifier(function.functionName, function.database), catalog)
}
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/package.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/package.scala
index a52a558a00a..6863516b698 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/package.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/package.scala
@@ -17,9 +17,6 @@
package org.apache.kyuubi.plugin.spark.authz
-import java.util.ServiceLoader
-
-import scala.collection.JavaConverters._
import scala.reflect.ClassTag
import com.fasterxml.jackson.core.`type`.TypeReference
@@ -28,16 +25,23 @@ import com.fasterxml.jackson.module.scala.DefaultScalaModule
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.kyuubi.plugin.spark.authz.OperationType.{OperationType, QUERY}
+import org.apache.kyuubi.plugin.spark.authz.serde.ActionTypeExtractor.actionTypeExtractors
+import org.apache.kyuubi.plugin.spark.authz.serde.CatalogExtractor.catalogExtractors
+import org.apache.kyuubi.plugin.spark.authz.serde.ColumnExtractor.columnExtractors
+import org.apache.kyuubi.plugin.spark.authz.serde.DatabaseExtractor.dbExtractors
+import org.apache.kyuubi.plugin.spark.authz.serde.FunctionExtractor.functionExtractors
+import org.apache.kyuubi.plugin.spark.authz.serde.FunctionTypeExtractor.functionTypeExtractors
+import org.apache.kyuubi.plugin.spark.authz.serde.QueryExtractor.queryExtractors
+import org.apache.kyuubi.plugin.spark.authz.serde.TableExtractor.tableExtractors
+import org.apache.kyuubi.plugin.spark.authz.serde.TableTypeExtractor.tableTypeExtractors
+import org.apache.kyuubi.util.reflect.ReflectUtils._
package object serde {
final val mapper = JsonMapper.builder().addModule(DefaultScalaModule).build()
- def loadExtractorsToMap[T <: Extractor](implicit ct: ClassTag[T]): Map[String, T] = {
- ServiceLoader.load(ct.runtimeClass).iterator().asScala
- .map { case e: Extractor => (e.key, e.asInstanceOf[T]) }
- .toMap
- }
+ def loadExtractorsToMap[T <: Extractor](implicit ct: ClassTag[T]): Map[String, T] =
+ loadFromServiceLoader[T]()(ct).map { e: T => (e.key, e) }.toMap
final lazy val DB_COMMAND_SPECS: Map[String, DatabaseCommandSpec] = {
val is = getClass.getClassLoader.getResourceAsStream("database_command_spec.json")
@@ -68,7 +72,8 @@ package object serde {
final private lazy val SCAN_SPECS: Map[String, ScanSpec] = {
val is = getClass.getClassLoader.getResourceAsStream("scan_command_spec.json")
mapper.readValue(is, new TypeReference[Array[ScanSpec]] {})
- .map(e => (e.classname, e)).toMap
+ .map(e => (e.classname, e))
+ .filter(t => t._2.scanDescs.nonEmpty).toMap
}
def isKnownScan(r: AnyRef): Boolean = {
@@ -79,6 +84,21 @@ package object serde {
SCAN_SPECS(r.getClass.getName)
}
+ final private lazy val FUNCTION_SPECS: Map[String, ScanSpec] = {
+ val is = getClass.getClassLoader.getResourceAsStream("scan_command_spec.json")
+ mapper.readValue(is, new TypeReference[Array[ScanSpec]] {})
+ .map(e => (e.classname, e))
+ .filter(t => t._2.functionDescs.nonEmpty).toMap
+ }
+
+ def isKnownFunction(r: AnyRef): Boolean = {
+ FUNCTION_SPECS.contains(r.getClass.getName)
+ }
+
+ def getFunctionSpec(r: AnyRef): ScanSpec = {
+ FUNCTION_SPECS(r.getClass.getName)
+ }
+
def operationType(plan: LogicalPlan): OperationType = {
val classname = plan.getClass.getName
TABLE_COMMAND_SPECS.get(classname)
@@ -87,4 +107,33 @@ package object serde {
.map(s => s.operationType)
.getOrElse(QUERY)
}
+
+ /**
+ * get extractor instance by extractor class name
+ * @param extractorKey explicitly load extractor by its simple class name.
+ * null by default means get extractor by extractor class.
+ * @param ct class tag of extractor class type
+ * @tparam T extractor class type
+ * @return
+ */
+ def lookupExtractor[T <: Extractor](extractorKey: String)(
+ implicit ct: ClassTag[T]): T = {
+ val extractorClass = ct.runtimeClass
+ val extractors: Map[String, Extractor] = extractorClass match {
+ case c if classOf[CatalogExtractor].isAssignableFrom(c) => catalogExtractors
+ case c if classOf[DatabaseExtractor].isAssignableFrom(c) => dbExtractors
+ case c if classOf[TableExtractor].isAssignableFrom(c) => tableExtractors
+ case c if classOf[TableTypeExtractor].isAssignableFrom(c) => tableTypeExtractors
+ case c if classOf[ColumnExtractor].isAssignableFrom(c) => columnExtractors
+ case c if classOf[QueryExtractor].isAssignableFrom(c) => queryExtractors
+ case c if classOf[FunctionExtractor].isAssignableFrom(c) => functionExtractors
+ case c if classOf[FunctionTypeExtractor].isAssignableFrom(c) => functionTypeExtractors
+ case c if classOf[ActionTypeExtractor].isAssignableFrom(c) => actionTypeExtractors
+ case _ => throw new IllegalArgumentException(s"Unknown extractor type: $ct")
+ }
+ extractors(extractorKey).asInstanceOf[T]
+ }
+
+ def lookupExtractor[T <: Extractor](implicit ct: ClassTag[T]): T =
+ lookupExtractor[T](ct.runtimeClass.getSimpleName)(ct)
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/tableExtractors.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/tableExtractors.scala
index c848381d426..94641d6d060 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/tableExtractors.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/serde/tableExtractors.scala
@@ -24,9 +24,11 @@ import scala.collection.JavaConverters._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.TableIdentifier
import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
/**
* A trait for extracting database and table as string tuple
@@ -46,10 +48,25 @@ object TableExtractor {
*/
def getOwner(v: AnyRef): Option[String] = {
// org.apache.spark.sql.connector.catalog.Table
- val table = invoke(v, "table")
+ val table = invokeAs[AnyRef](v, "table")
val properties = invokeAs[JMap[String, String]](table, "properties").asScala
properties.get("owner")
}
+
+ def getOwner(spark: SparkSession, catalogName: String, tableIdent: AnyRef): Option[String] = {
+ try {
+ val catalogManager = invokeAs[AnyRef](spark.sessionState, "catalogManager")
+ val catalog = invokeAs[AnyRef](catalogManager, "catalog", (classOf[String], catalogName))
+ val table = invokeAs[AnyRef](
+ catalog,
+ "loadTable",
+ (Class.forName("org.apache.spark.sql.connector.catalog.Identifier"), tableIdent))
+ getOwner(table)
+ } catch {
+ // Exception may occur due to invalid reflection or table not found
+ case _: Exception => None
+ }
+ }
}
/**
@@ -87,7 +104,7 @@ class CatalogTableTableExtractor extends TableExtractor {
class CatalogTableOptionTableExtractor extends TableExtractor {
override def apply(spark: SparkSession, v1: AnyRef): Option[Table] = {
val catalogTable = v1.asInstanceOf[Option[CatalogTable]]
- catalogTable.flatMap(new CatalogTableTableExtractor().apply(spark, _))
+ catalogTable.flatMap(lookupExtractor[CatalogTableTableExtractor].apply(spark, _))
}
}
@@ -96,10 +113,10 @@ class CatalogTableOptionTableExtractor extends TableExtractor {
*/
class ResolvedTableTableExtractor extends TableExtractor {
override def apply(spark: SparkSession, v1: AnyRef): Option[Table] = {
- val catalogVal = invoke(v1, "catalog")
- val catalog = new CatalogPluginCatalogExtractor().apply(catalogVal)
- val identifier = invoke(v1, "identifier")
- val maybeTable = new IdentifierTableExtractor().apply(spark, identifier)
+ val catalogVal = invokeAs[AnyRef](v1, "catalog")
+ val catalog = lookupExtractor[CatalogPluginCatalogExtractor].apply(catalogVal)
+ val identifier = invokeAs[AnyRef](v1, "identifier")
+ val maybeTable = lookupExtractor[IdentifierTableExtractor].apply(spark, identifier)
val maybeOwner = TableExtractor.getOwner(v1)
maybeTable.map(_.copy(catalog = catalog, owner = maybeOwner))
}
@@ -116,6 +133,34 @@ class IdentifierTableExtractor extends TableExtractor {
}
}
+/**
+ * java.lang.String
+ * with concat parts by "."
+ */
+class StringTableExtractor extends TableExtractor {
+ override def apply(spark: SparkSession, v1: AnyRef): Option[Table] = {
+ val tableNameArr = v1.asInstanceOf[String].split("\\.")
+ val maybeTable = tableNameArr.length match {
+ case 1 => Table(None, None, tableNameArr(0), None)
+ case 2 => Table(None, Some(tableNameArr(0)), tableNameArr(1), None)
+ case 3 => Table(Some(tableNameArr(0)), Some(tableNameArr(1)), tableNameArr(2), None)
+ }
+ Option(maybeTable)
+ }
+}
+
+/**
+ * Seq[org.apache.spark.sql.catalyst.expressions.Expression]
+ */
+class ExpressionSeqTableExtractor extends TableExtractor {
+ override def apply(spark: SparkSession, v1: AnyRef): Option[Table] = {
+ val expressions = v1.asInstanceOf[Seq[Expression]]
+ // Iceberg will rearrange the parameters according to the parameter order
+ // defined in the procedure, where the table parameters are currently always the first.
+ lookupExtractor[StringTableExtractor].apply(spark, expressions.head.toString())
+ }
+}
+
/**
* org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation
*/
@@ -128,13 +173,12 @@ class DataSourceV2RelationTableExtractor extends TableExtractor {
case Some(v2Relation) =>
val maybeCatalogPlugin = invokeAs[Option[AnyRef]](v2Relation, "catalog")
val maybeCatalog = maybeCatalogPlugin.flatMap(catalogPlugin =>
- new CatalogPluginCatalogExtractor().apply(catalogPlugin))
- val maybeIdentifier = invokeAs[Option[AnyRef]](v2Relation, "identifier")
- maybeIdentifier.flatMap { id =>
- val maybeTable = new IdentifierTableExtractor().apply(spark, id)
- val maybeOwner = TableExtractor.getOwner(v2Relation)
- maybeTable.map(_.copy(catalog = maybeCatalog, owner = maybeOwner))
- }
+ lookupExtractor[CatalogPluginCatalogExtractor].apply(catalogPlugin))
+ lookupExtractor[TableTableExtractor].apply(spark, invokeAs[AnyRef](v2Relation, "table"))
+ .map { table =>
+ val maybeOwner = TableExtractor.getOwner(v2Relation)
+ table.copy(catalog = maybeCatalog, owner = maybeOwner)
+ }
}
}
}
@@ -146,7 +190,7 @@ class LogicalRelationTableExtractor extends TableExtractor {
override def apply(spark: SparkSession, v1: AnyRef): Option[Table] = {
val maybeCatalogTable = invokeAs[Option[AnyRef]](v1, "catalogTable")
maybeCatalogTable.flatMap { ct =>
- new CatalogTableTableExtractor().apply(spark, ct)
+ lookupExtractor[CatalogTableTableExtractor].apply(spark, ct)
}
}
}
@@ -156,11 +200,39 @@ class LogicalRelationTableExtractor extends TableExtractor {
*/
class ResolvedDbObjectNameTableExtractor extends TableExtractor {
override def apply(spark: SparkSession, v1: AnyRef): Option[Table] = {
- val catalogVal = invoke(v1, "catalog")
- val catalog = new CatalogPluginCatalogExtractor().apply(catalogVal)
+ val catalogVal = invokeAs[AnyRef](v1, "catalog")
+ val catalog = lookupExtractor[CatalogPluginCatalogExtractor].apply(catalogVal)
val nameParts = invokeAs[Seq[String]](v1, "nameParts")
val namespace = nameParts.init.toArray
val table = nameParts.last
Some(Table(catalog, Some(quote(namespace)), table, None))
}
}
+
+/**
+ * org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier
+ */
+class ResolvedIdentifierTableExtractor extends TableExtractor {
+ override def apply(spark: SparkSession, v1: AnyRef): Option[Table] = {
+ v1.getClass.getName match {
+ case "org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier" =>
+ val catalogVal = invokeAs[AnyRef](v1, "catalog")
+ val catalog = lookupExtractor[CatalogPluginCatalogExtractor].apply(catalogVal)
+ val identifier = invokeAs[AnyRef](v1, "identifier")
+ val maybeTable = lookupExtractor[IdentifierTableExtractor].apply(spark, identifier)
+ val owner = catalog.flatMap(name => TableExtractor.getOwner(spark, name, identifier))
+ maybeTable.map(_.copy(catalog = catalog, owner = owner))
+ case _ => None
+ }
+ }
+}
+
+/**
+ * org.apache.spark.sql.connector.catalog.Table
+ */
+class TableTableExtractor extends TableExtractor {
+ override def apply(spark: SparkSession, v1: AnyRef): Option[Table] = {
+ val tableName = invokeAs[String](v1, "name")
+ lookupExtractor[StringTableExtractor].apply(spark, tableName)
+ }
+}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/AuthZUtils.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/AuthZUtils.scala
index 5773e1c9340..4f7cbb9ef14 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/AuthZUtils.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/AuthZUtils.scala
@@ -23,8 +23,6 @@ import java.security.interfaces.ECPublicKey
import java.security.spec.X509EncodedKeySpec
import java.util.Base64
-import scala.util.{Failure, Success, Try}
-
import org.apache.commons.lang3.StringUtils
import org.apache.hadoop.security.UserGroupInformation
import org.apache.ranger.plugin.service.RangerBasePlugin
@@ -33,67 +31,12 @@ import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, View}
import org.apache.kyuubi.plugin.spark.authz.AccessControlException
import org.apache.kyuubi.plugin.spark.authz.util.ReservedKeys._
+import org.apache.kyuubi.util.SemanticVersion
+import org.apache.kyuubi.util.reflect.DynConstructors
+import org.apache.kyuubi.util.reflect.ReflectUtils._
private[authz] object AuthZUtils {
- /**
- * fixme error handling need improve here
- */
- def getFieldVal[T](o: Any, name: String): T = {
- Try {
- val field = o.getClass.getDeclaredField(name)
- field.setAccessible(true)
- field.get(o)
- } match {
- case Success(value) => value.asInstanceOf[T]
- case Failure(e) =>
- val candidates = o.getClass.getDeclaredFields.map(_.getName).mkString("[", ",", "]")
- throw new RuntimeException(s"$name not in ${o.getClass} $candidates", e)
- }
- }
-
- def getFieldValOpt[T](o: Any, name: String): Option[T] = Try(getFieldVal[T](o, name)).toOption
-
- def invoke(
- obj: AnyRef,
- methodName: String,
- args: (Class[_], AnyRef)*): AnyRef = {
- try {
- val (types, values) = args.unzip
- val method = obj.getClass.getMethod(methodName, types: _*)
- method.setAccessible(true)
- method.invoke(obj, values: _*)
- } catch {
- case e: NoSuchMethodException =>
- val candidates = obj.getClass.getMethods.map(_.getName).mkString("[", ",", "]")
- throw new RuntimeException(s"$methodName not in ${obj.getClass} $candidates", e)
- }
- }
-
- def invokeAs[T](
- obj: AnyRef,
- methodName: String,
- args: (Class[_], AnyRef)*): T = {
- invoke(obj, methodName, args: _*).asInstanceOf[T]
- }
-
- def invokeStatic(
- obj: Class[_],
- methodName: String,
- args: (Class[_], AnyRef)*): AnyRef = {
- val (types, values) = args.unzip
- val method = obj.getMethod(methodName, types: _*)
- method.setAccessible(true)
- method.invoke(obj, values: _*)
- }
-
- def invokeStaticAs[T](
- obj: Class[_],
- methodName: String,
- args: (Class[_], AnyRef)*): T = {
- invokeStatic(obj, methodName, args: _*).asInstanceOf[T]
- }
-
/**
* Get the active session user
* @param spark spark context instance
@@ -118,8 +61,8 @@ private[authz] object AuthZUtils {
def hasResolvedPermanentView(plan: LogicalPlan): Boolean = {
plan match {
- case view: View if view.resolved && isSparkVersionAtLeast("3.1.0") =>
- !getFieldVal[Boolean](view, "isTempView")
+ case view: View if view.resolved && isSparkV31OrGreater =>
+ !getField[Boolean](view, "isTempView")
case _ =>
false
}
@@ -127,7 +70,12 @@ private[authz] object AuthZUtils {
lazy val isRanger21orGreater: Boolean = {
try {
- classOf[RangerBasePlugin].getConstructor(classOf[String], classOf[String], classOf[String])
+ DynConstructors.builder().impl(
+ classOf[RangerBasePlugin],
+ classOf[String],
+ classOf[String],
+ classOf[String])
+ .buildChecked[RangerBasePlugin]()
true
} catch {
case _: NoSuchMethodException =>
@@ -135,30 +83,10 @@ private[authz] object AuthZUtils {
}
}
- def isSparkVersionAtMost(targetVersionString: String): Boolean = {
- SemanticVersion(SPARK_VERSION).isVersionAtMost(targetVersionString)
- }
-
- def isSparkVersionAtLeast(targetVersionString: String): Boolean = {
- SemanticVersion(SPARK_VERSION).isVersionAtLeast(targetVersionString)
- }
-
- def isSparkVersionEqualTo(targetVersionString: String): Boolean = {
- SemanticVersion(SPARK_VERSION).isVersionEqualTo(targetVersionString)
- }
-
- /**
- * check if spark version satisfied
- * first param is option of supported most spark version,
- * and secont param is option of supported least spark version
- *
- * @return
- */
- def passSparkVersionCheck: (Option[String], Option[String]) => Boolean =
- (mostSparkVersion, leastSparkVersion) => {
- mostSparkVersion.forall(isSparkVersionAtMost) &&
- leastSparkVersion.forall(isSparkVersionAtLeast)
- }
+ lazy val SPARK_RUNTIME_VERSION: SemanticVersion = SemanticVersion(SPARK_VERSION)
+ lazy val isSparkV31OrGreater: Boolean = SPARK_RUNTIME_VERSION >= "3.1"
+ lazy val isSparkV32OrGreater: Boolean = SPARK_RUNTIME_VERSION >= "3.2"
+ lazy val isSparkV33OrGreater: Boolean = SPARK_RUNTIME_VERSION >= "3.3"
def quoteIfNeeded(part: String): String = {
if (part.matches("[a-zA-Z0-9_]+") && !part.matches("\\d+")) {
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/ObjectFilterPlaceHolder.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/ObjectFilterPlaceHolder.scala
index a5d1c0d3b54..0d3c39adb69 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/ObjectFilterPlaceHolder.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/ObjectFilterPlaceHolder.scala
@@ -18,9 +18,19 @@
package org.apache.kyuubi.plugin.spark.authz.util
import org.apache.spark.sql.catalyst.expressions.Attribute
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, UnaryNode}
+
+case class ObjectFilterPlaceHolder(child: LogicalPlan) extends UnaryNode
+ with WithInternalChild {
-case class ObjectFilterPlaceHolder(child: LogicalPlan) extends LeafNode {
override def output: Seq[Attribute] = child.output
- override def computeStats(): Statistics = child.stats
+
+ override def withNewChildInternal(newChild: LogicalPlan): LogicalPlan = {
+ // `FilterDataSourceV2Strategy` requires child.nodename not changed
+ if (child.nodeName == newChild.nodeName) {
+ copy(newChild)
+ } else {
+ this
+ }
+ }
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/RangerConfigProvider.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/RangerConfigProvider.scala
index 83fe048e677..a61d94a8fc8 100644
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/RangerConfigProvider.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/RangerConfigProvider.scala
@@ -20,6 +20,7 @@ package org.apache.kyuubi.plugin.spark.authz.util
import org.apache.hadoop.conf.Configuration
import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
trait RangerConfigProvider {
@@ -33,15 +34,13 @@ trait RangerConfigProvider {
* org.apache.ranger.authorization.hadoop.config.RangerConfiguration
* for Ranger 2.0 and below
*/
- def getRangerConf: Configuration = {
+ val getRangerConf: Configuration = {
if (isRanger21orGreater) {
// for Ranger 2.1+
- invokeAs[Configuration](this, "getConfig")
+ invokeAs(this, "getConfig")
} else {
// for Ranger 2.0 and below
- invokeStaticAs[Configuration](
- Class.forName("org.apache.ranger.authorization.hadoop.config.RangerConfiguration"),
- "getInstance")
+ invokeAs("org.apache.ranger.authorization.hadoop.config.RangerConfiguration", "getInstance")
}
}
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/SemanticVersion.scala b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/SemanticVersion.scala
deleted file mode 100644
index 4d7e8972505..00000000000
--- a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/util/SemanticVersion.scala
+++ /dev/null
@@ -1,74 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.plugin.spark.authz.util
-
-/**
- * Encapsulate a component Spark version for the convenience of version checks.
- * Copy from org.apache.kyuubi.engine.ComponentVersion
- */
-case class SemanticVersion(majorVersion: Int, minorVersion: Int) {
-
- def isVersionAtMost(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- (runtimeMajor < targetMajor) || {
- runtimeMajor == targetMajor && runtimeMinor <= targetMinor
- })
- }
-
- def isVersionAtLeast(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- (runtimeMajor > targetMajor) || {
- runtimeMajor == targetMajor && runtimeMinor >= targetMinor
- })
- }
-
- def isVersionEqualTo(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- runtimeMajor == targetMajor && runtimeMinor == targetMinor)
- }
-
- def compareVersion(
- targetVersionString: String,
- callback: (Int, Int, Int, Int) => Boolean): Boolean = {
- val targetVersion = SemanticVersion(targetVersionString)
- val targetMajor = targetVersion.majorVersion
- val targetMinor = targetVersion.minorVersion
- callback(targetMajor, targetMinor, this.majorVersion, this.minorVersion)
- }
-
- override def toString: String = s"$majorVersion.$minorVersion"
-}
-
-object SemanticVersion {
-
- def apply(versionString: String): SemanticVersion = {
- """^(\d+)\.(\d+)(\..*)?$""".r.findFirstMatchIn(versionString) match {
- case Some(m) =>
- SemanticVersion(m.group(1).toInt, m.group(2).toInt)
- case None =>
- throw new IllegalArgumentException(s"Tried to parse '$versionString' as a project" +
- s" version string, but it could not find the major and minor version numbers.")
- }
- }
-}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/gen/scala/org/apache/kyuubi/plugin/spark/authz/gen/PolicyJsonFileGenerator.scala b/extensions/spark/kyuubi-spark-authz/src/test/gen/scala/org/apache/kyuubi/plugin/spark/authz/gen/PolicyJsonFileGenerator.scala
new file mode 100644
index 00000000000..7faddd0c7fa
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-authz/src/test/gen/scala/org/apache/kyuubi/plugin/spark/authz/gen/PolicyJsonFileGenerator.scala
@@ -0,0 +1,348 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.spark.authz.gen
+
+import java.nio.charset.StandardCharsets
+import java.nio.file.{Files, Paths, StandardOpenOption}
+import java.util.UUID
+
+import com.fasterxml.jackson.annotation.JsonInclude.Include
+import com.fasterxml.jackson.databind.{JsonNode, ObjectMapper}
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.databind.node.ObjectNode
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+import org.apache.ranger.plugin.model.RangerPolicy
+import org.scalatest.funsuite.AnyFunSuite
+
+// scalastyle:off
+import org.apache.kyuubi.plugin.spark.authz.RangerTestNamespace._
+import org.apache.kyuubi.plugin.spark.authz.RangerTestUsers._
+import org.apache.kyuubi.plugin.spark.authz.gen.KRangerPolicyItemAccess.allowTypes
+import org.apache.kyuubi.plugin.spark.authz.gen.KRangerPolicyResource._
+import org.apache.kyuubi.plugin.spark.authz.gen.RangerAccessType._
+import org.apache.kyuubi.plugin.spark.authz.gen.RangerClassConversions._
+import org.apache.kyuubi.util.AssertionUtils._
+
+/**
+ * Generates the policy file to test/main/resources dir.
+ *
+ * To run the test suite:
+ * {{{
+ * KYUUBI_UPDATE=0 dev/gen/gen_ranger_policy_json.sh
+ * }}}
+ *
+ * To regenerate the ranger policy file:
+ * {{{
+ * dev/gen/gen_ranger_policy_json.sh
+ * }}}
+ */
+class PolicyJsonFileGenerator extends AnyFunSuite {
+ // scalastyle:on
+ final private val mapper: ObjectMapper = JsonMapper.builder()
+ .addModule(DefaultScalaModule)
+ .serializationInclusion(Include.NON_NULL)
+ .build()
+
+ test("check ranger policy file") {
+ val pluginHome = getClass.getProtectionDomain.getCodeSource.getLocation.getPath
+ .split("target").head
+ val policyFileName = "sparkSql_hive_jenkins.json"
+ val policyFilePath =
+ Paths.get(pluginHome, "src", "test", "resources", policyFileName)
+ val generatedStr = mapper.writerWithDefaultPrettyPrinter()
+ .writeValueAsString(servicePolicies)
+
+ if (sys.env.get("KYUUBI_UPDATE").contains("1")) {
+ // scalastyle:off println
+ println(s"Writing ranger policies to $policyFileName.")
+ // scalastyle:on println
+ Files.write(
+ policyFilePath,
+ generatedStr.getBytes(StandardCharsets.UTF_8),
+ StandardOpenOption.CREATE,
+ StandardOpenOption.TRUNCATE_EXISTING)
+ } else {
+ assertFileContent(
+ policyFilePath,
+ Seq(generatedStr),
+ "dev/gen/gen_ranger_policy_json.sh",
+ splitFirstExpectedLine = true)
+ }
+ }
+
+ private def servicePolicies: JsonNode = {
+ val inputStream = Thread.currentThread().getContextClassLoader
+ .getResourceAsStream("policies_base.json")
+ val rootObjNode = mapper.readTree(inputStream).asInstanceOf[ObjectNode]
+ val policies = genPolicies
+ // scalastyle:off println
+ println(s"Generated ${policies.size} policies.")
+ // scalastyle:on println
+ rootObjNode.set("policies", mapper.readTree(mapper.writeValueAsString(policies)))
+ }
+
+ private def genPolicies: Iterable[RangerPolicy] = {
+ List[RangerPolicy](
+ // access for all
+ policyAccessForAllUrl,
+ policyAccessForAllDbTableColumns,
+ policyAccessForAllDbUdf,
+ // access
+ policyAccessForDbAllColumns,
+ policyAccessForDefaultDbSrcTable,
+ policyAccessForDefaultBobUse,
+ policyAccessForDefaultBobSelect,
+ policyAccessForPermViewAccessOnly,
+ // row filter
+ policyFilterForSrcTableKeyLessThan20,
+ policyFilterForPermViewKeyLessThan20,
+ // data masking
+ policyMaskForPermView,
+ policyMaskForPermViewUser,
+ policyMaskNullifyForValue2,
+ policyMaskShowFirst4ForValue3,
+ policyMaskDateShowYearForValue4,
+ policyMaskShowFirst4ForValue5)
+ // fill the id and guid with auto-increased index
+ .zipWithIndex
+ .map {
+ case (p, index) =>
+ p.setId(index)
+ p.setGuid(UUID.nameUUIDFromBytes(index.toString.getBytes()).toString)
+ p
+ }
+ }
+
+ // resources
+ private val allDatabaseRes = databaseRes("*")
+ private val allTableRes = tableRes("*")
+ private val allColumnRes = columnRes("*")
+ private val srcTableRes = tableRes("src")
+
+ // policy type
+ private val POLICY_TYPE_ACCESS: Int = 0
+ private val POLICY_TYPE_DATAMASK: Int = 1
+ private val POLICY_TYPE_ROWFILTER: Int = 2
+
+ // policies
+ private val policyAccessForAllUrl = KRangerPolicy(
+ name = "all - url",
+ description = "Policy for all - url",
+ resources = Map("url" -> KRangerPolicyResource(
+ values = List("*"),
+ isRecursive = true)),
+ policyItems = List(KRangerPolicyItem(
+ users = List(admin),
+ accesses = allowTypes(select, update, create, drop, alter, index, lock, all, read, write),
+ delegateAdmin = true)))
+
+ private val policyAccessForAllDbTableColumns = KRangerPolicy(
+ name = "all - database, table, column",
+ description = "Policy for all - database, table, column",
+ resources = Map(allDatabaseRes, allTableRes, allColumnRes),
+ policyItems = List(KRangerPolicyItem(
+ users = List(admin),
+ accesses = allowTypes(select, update, create, drop, alter, index, lock, all, read, write),
+ delegateAdmin = true)))
+
+ private val policyAccessForAllDbUdf = KRangerPolicy(
+ name = "all - database, udf",
+ description = "Policy for all - database, udf",
+ resources = Map(allDatabaseRes, "udf" -> KRangerPolicyResource(values = List("*"))),
+ policyItems = List(KRangerPolicyItem(
+ users = List(admin),
+ accesses = allowTypes(select, update, create, drop, alter, index, lock, all, read, write),
+ delegateAdmin = true)))
+
+ private val policyAccessForDbAllColumns = KRangerPolicy(
+ name = "all - database, udf",
+ description = "Policy for all - database, udf",
+ resources = Map(
+ databaseRes(defaultDb, sparkCatalog, icebergNamespace, namespace1),
+ allTableRes,
+ allColumnRes),
+ policyItems = List(
+ KRangerPolicyItem(
+ users = List(bob, permViewUser, ownerPlaceHolder),
+ accesses = allowTypes(select, update, create, drop, alter, index, lock, all, read, write),
+ delegateAdmin = true),
+ KRangerPolicyItem(
+ users = List(defaultTableOwner, createOnlyUser),
+ accesses = allowTypes(create),
+ delegateAdmin = true)))
+
+ private val policyAccessForDefaultDbSrcTable = KRangerPolicy(
+ name = "default_kent",
+ resources = Map(
+ databaseRes(defaultDb, sparkCatalog),
+ srcTableRes,
+ columnRes("key")),
+ policyItems = List(
+ KRangerPolicyItem(
+ users = List(kent),
+ accesses = allowTypes(select, update, create, drop, alter, index, lock, all, read, write),
+ delegateAdmin = true),
+ KRangerPolicyItem(
+ users = List(defaultTableOwner, createOnlyUser),
+ accesses = allowTypes(create),
+ delegateAdmin = true)))
+
+ private val policyFilterForSrcTableKeyLessThan20 = KRangerPolicy(
+ name = "src_key_less_than_20",
+ policyType = POLICY_TYPE_ROWFILTER,
+ resources = Map(
+ databaseRes(defaultDb),
+ srcTableRes),
+ rowFilterPolicyItems = List(
+ KRangerRowFilterPolicyItem(
+ rowFilterInfo = KRangerPolicyItemRowFilterInfo(filterExpr = "key<20"),
+ accesses = allowTypes(select),
+ users = List(bob, permViewUser))))
+
+ private val policyFilterForPermViewKeyLessThan20 = KRangerPolicy(
+ name = "perm_view_key_less_than_20",
+ policyType = POLICY_TYPE_ROWFILTER,
+ resources = Map(
+ databaseRes(defaultDb),
+ tableRes("perm_view")),
+ rowFilterPolicyItems = List(
+ KRangerRowFilterPolicyItem(
+ rowFilterInfo = KRangerPolicyItemRowFilterInfo(filterExpr = "key<20"),
+ accesses = allowTypes(select),
+ users = List(permViewUser))))
+
+ private val policyAccessForDefaultBobUse = KRangerPolicy(
+ name = "default_bob_use",
+ resources = Map(
+ databaseRes("default_bob", sparkCatalog),
+ tableRes("table_use*"),
+ allColumnRes),
+ policyItems = List(
+ KRangerPolicyItem(
+ users = List(bob),
+ accesses = allowTypes(update),
+ delegateAdmin = true)))
+
+ private val policyAccessForDefaultBobSelect = KRangerPolicy(
+ name = "default_bob_select",
+ resources = Map(
+ databaseRes("default_bob", sparkCatalog),
+ tableRes("table_select*"),
+ allColumnRes),
+ policyItems = List(
+ KRangerPolicyItem(
+ users = List(bob),
+ accesses = allowTypes(select, use),
+ delegateAdmin = true)))
+
+ private val policyMaskForPermView = KRangerPolicy(
+ name = "src_value_hash_perm_view",
+ policyType = POLICY_TYPE_DATAMASK,
+ resources = Map(
+ databaseRes(defaultDb, sparkCatalog),
+ srcTableRes,
+ columnRes("value1")),
+ dataMaskPolicyItems = List(
+ KRangerDataMaskPolicyItem(
+ dataMaskInfo = KRangerPolicyItemDataMaskInfo(dataMaskType = "MASK_HASH"),
+ users = List(bob),
+ accesses = allowTypes(select),
+ delegateAdmin = true)))
+
+ private val policyMaskForPermViewUser = KRangerPolicy(
+ name = "src_value_hash",
+ policyType = POLICY_TYPE_DATAMASK,
+ resources = Map(
+ databaseRes(defaultDb, sparkCatalog),
+ tableRes("perm_view"),
+ columnRes("value1")),
+ dataMaskPolicyItems = List(
+ KRangerDataMaskPolicyItem(
+ dataMaskInfo = KRangerPolicyItemDataMaskInfo(dataMaskType = "MASK_HASH"),
+ users = List(permViewUser),
+ accesses = allowTypes(select),
+ delegateAdmin = true)))
+
+ private val policyMaskNullifyForValue2 = KRangerPolicy(
+ name = "src_value2_nullify",
+ policyType = POLICY_TYPE_DATAMASK,
+ resources = Map(
+ databaseRes(defaultDb, sparkCatalog, icebergNamespace, namespace1),
+ srcTableRes,
+ columnRes("value2")),
+ dataMaskPolicyItems = List(
+ KRangerDataMaskPolicyItem(
+ dataMaskInfo = KRangerPolicyItemDataMaskInfo(dataMaskType = "MASK"),
+ users = List(bob),
+ accesses = allowTypes(select),
+ delegateAdmin = true)))
+
+ private val policyMaskShowFirst4ForValue3 = KRangerPolicy(
+ name = "src_value3_sf4",
+ policyType = POLICY_TYPE_DATAMASK,
+ resources = Map(
+ databaseRes(defaultDb, sparkCatalog),
+ srcTableRes,
+ columnRes("value3")),
+ dataMaskPolicyItems = List(
+ KRangerDataMaskPolicyItem(
+ dataMaskInfo = KRangerPolicyItemDataMaskInfo(dataMaskType = "MASK_SHOW_FIRST_4"),
+ users = List(bob),
+ accesses = allowTypes(select),
+ delegateAdmin = true)))
+
+ private val policyMaskDateShowYearForValue4 = KRangerPolicy(
+ name = "src_value4_sf4",
+ policyType = POLICY_TYPE_DATAMASK,
+ resources = Map(
+ databaseRes(defaultDb, sparkCatalog),
+ srcTableRes,
+ columnRes("value4")),
+ dataMaskPolicyItems = List(
+ KRangerDataMaskPolicyItem(
+ dataMaskInfo = KRangerPolicyItemDataMaskInfo(dataMaskType = "MASK_DATE_SHOW_YEAR"),
+ users = List(bob),
+ accesses = allowTypes(select),
+ delegateAdmin = true)))
+
+ private val policyMaskShowFirst4ForValue5 = KRangerPolicy(
+ name = "src_value5_sf4",
+ policyType = POLICY_TYPE_DATAMASK,
+ resources = Map(
+ databaseRes(defaultDb, sparkCatalog),
+ srcTableRes,
+ columnRes("value5")),
+ dataMaskPolicyItems = List(
+ KRangerDataMaskPolicyItem(
+ dataMaskInfo = KRangerPolicyItemDataMaskInfo(dataMaskType = "MASK_SHOW_LAST_4"),
+ users = List(bob),
+ accesses = allowTypes(select),
+ delegateAdmin = true)))
+
+ private val policyAccessForPermViewAccessOnly = KRangerPolicy(
+ name = "someone_access_perm_view",
+ resources = Map(
+ databaseRes(defaultDb),
+ tableRes("perm_view"),
+ allColumnRes),
+ policyItems = List(
+ KRangerPolicyItem(
+ users = List(permViewOnlyUser),
+ accesses = allowTypes(select),
+ delegateAdmin = true)))
+}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/gen/scala/org/apache/kyuubi/plugin/spark/authz/gen/RangerGenWrapper.scala b/extensions/spark/kyuubi-spark-authz/src/test/gen/scala/org/apache/kyuubi/plugin/spark/authz/gen/RangerGenWrapper.scala
new file mode 100644
index 00000000000..71bce375972
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-authz/src/test/gen/scala/org/apache/kyuubi/plugin/spark/authz/gen/RangerGenWrapper.scala
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kyuubi.plugin.spark.authz.gen
+
+import scala.collection.convert.ImplicitConversions._
+import scala.language.implicitConversions
+
+import org.apache.ranger.plugin.model.RangerPolicy
+import org.apache.ranger.plugin.model.RangerPolicy._
+
+import org.apache.kyuubi.plugin.spark.authz.gen.RangerClassConversions._
+
+trait RangerObjectGenerator[T] {
+ def get: T
+}
+
+object RangerClassConversions {
+ implicit def getRangerObject[T](g: RangerObjectGenerator[T]): T = g.get
+}
+
+case class KRangerPolicy(
+ service: String = "hive_jenkins",
+ name: String,
+ policyType: Int = 0,
+ description: String = "",
+ isAuditEnabled: Boolean = true,
+ resources: Map[String, RangerPolicyResource] = Map.empty,
+ conditions: List[RangerPolicyItemCondition] = List.empty,
+ policyItems: List[RangerPolicyItem] = List.empty,
+ denyPolicyItems: List[RangerPolicyItem] = List.empty,
+ allowExceptions: List[RangerPolicyItem] = List.empty,
+ denyExceptions: List[RangerPolicyItem] = List.empty,
+ dataMaskPolicyItems: List[RangerDataMaskPolicyItem] = List.empty,
+ rowFilterPolicyItems: List[RangerRowFilterPolicyItem] = List.empty,
+ id: Int = 0,
+ guid: String = "",
+ isEnabled: Boolean = true,
+ version: Int = 1) extends RangerObjectGenerator[RangerPolicy] {
+ override def get: RangerPolicy = {
+ val p = new RangerPolicy()
+ p.setService(service)
+ p.setName(name)
+ p.setPolicyType(policyType)
+ p.setDescription(description)
+ p.setIsAuditEnabled(isAuditEnabled)
+ p.setResources(resources)
+ p.setConditions(conditions)
+ p.setPolicyItems(policyItems)
+ p.setAllowExceptions(allowExceptions)
+ p.setDenyExceptions(denyExceptions)
+ p.setDataMaskPolicyItems(dataMaskPolicyItems)
+ p.setRowFilterPolicyItems(rowFilterPolicyItems)
+ p.setId(id)
+ p.setGuid(guid)
+ p.setIsAuditEnabled(isEnabled)
+ p.setVersion(version)
+ p
+ }
+}
+
+case class KRangerPolicyResource(
+ values: List[String] = List.empty,
+ isExcludes: Boolean = false,
+ isRecursive: Boolean = false) extends RangerObjectGenerator[RangerPolicyResource] {
+ override def get: RangerPolicyResource = {
+ val r = new RangerPolicyResource()
+ r.setValues(values)
+ r.setIsExcludes(isExcludes)
+ r.setIsRecursive(isRecursive)
+ r
+ }
+}
+
+object KRangerPolicyResource {
+ def databaseRes(values: String*): (String, RangerPolicyResource) =
+ "database" -> KRangerPolicyResource(values.toList)
+
+ def tableRes(values: String*): (String, RangerPolicyResource) =
+ "table" -> KRangerPolicyResource(values.toList)
+
+ def columnRes(values: String*): (String, RangerPolicyResource) =
+ "column" -> KRangerPolicyResource(values.toList)
+}
+
+case class KRangerPolicyItemCondition(
+ `type`: String,
+ values: List[String]) extends RangerObjectGenerator[RangerPolicyItemCondition] {
+ override def get: RangerPolicyItemCondition = {
+ val c = new RangerPolicyItemCondition()
+ c.setType(`type`)
+ c.setValues(values)
+ c
+ }
+}
+
+case class KRangerPolicyItem(
+ accesses: List[RangerPolicyItemAccess] = List.empty,
+ users: List[String] = List.empty,
+ groups: List[String] = List.empty,
+ conditions: List[RangerPolicyItemCondition] = List.empty,
+ delegateAdmin: Boolean = false) extends RangerObjectGenerator[RangerPolicyItem] {
+ override def get: RangerPolicyItem = {
+ val i = new RangerPolicyItem()
+ i.setAccesses(accesses)
+ i.setUsers(users)
+ i.setGroups(groups)
+ i.setConditions(conditions)
+ i.setDelegateAdmin(delegateAdmin)
+ i
+ }
+}
+
+case class KRangerPolicyItemAccess(
+ `type`: String,
+ isAllowed: Boolean) extends RangerObjectGenerator[RangerPolicyItemAccess] {
+ override def get: RangerPolicyItemAccess = {
+ val a = new RangerPolicyItemAccess
+ a.setType(`type`)
+ a.setIsAllowed(isAllowed)
+ a
+ }
+}
+
+object KRangerPolicyItemAccess {
+ def allowTypes(types: String*): List[RangerPolicyItemAccess] =
+ types.map(t => KRangerPolicyItemAccess(t, isAllowed = true).get).toList
+}
+
+case class KRangerDataMaskPolicyItem(
+ dataMaskInfo: RangerPolicyItemDataMaskInfo,
+ accesses: List[RangerPolicyItemAccess] = List.empty,
+ users: List[String] = List.empty,
+ groups: List[String] = List.empty,
+ conditions: List[RangerPolicyItemCondition] = List.empty,
+ delegateAdmin: Boolean = false) extends RangerObjectGenerator[RangerDataMaskPolicyItem] {
+ override def get: RangerDataMaskPolicyItem = {
+ val i = new RangerDataMaskPolicyItem
+ i.setDataMaskInfo(dataMaskInfo)
+ i.setAccesses(accesses)
+ i.setUsers(users)
+ i.setGroups(groups)
+ i.setConditions(conditions)
+ i.setDelegateAdmin(delegateAdmin)
+ i
+ }
+}
+
+case class KRangerPolicyItemDataMaskInfo(
+ dataMaskType: String) extends RangerObjectGenerator[RangerPolicyItemDataMaskInfo] {
+ override def get: RangerPolicyItemDataMaskInfo = {
+ val i = new RangerPolicyItemDataMaskInfo
+ i.setDataMaskType(dataMaskType)
+ i
+ }
+}
+
+case class KRangerRowFilterPolicyItem(
+ rowFilterInfo: RangerPolicyItemRowFilterInfo,
+ accesses: List[RangerPolicyItemAccess] = List.empty,
+ users: List[String] = List.empty,
+ groups: List[String] = List.empty,
+ conditions: List[RangerPolicyItemCondition] = List.empty,
+ delegateAdmin: Boolean = false) extends RangerObjectGenerator[RangerRowFilterPolicyItem] {
+ override def get: RangerRowFilterPolicyItem = {
+ val i = new RangerRowFilterPolicyItem
+ i.setRowFilterInfo(rowFilterInfo)
+ i.setAccesses(accesses)
+ i.setUsers(users)
+ i.setGroups(groups)
+ i.setConditions(conditions)
+ i.setDelegateAdmin(delegateAdmin)
+ i
+ }
+}
+
+case class KRangerPolicyItemRowFilterInfo(
+ filterExpr: String) extends RangerObjectGenerator[RangerPolicyItemRowFilterInfo] {
+ override def get: RangerPolicyItemRowFilterInfo = {
+ val i = new RangerPolicyItemRowFilterInfo
+ i.setFilterExpr(filterExpr)
+ i
+ }
+}
+
+object RangerAccessType {
+ val select = "select"
+ val update = "update"
+ val create = "create"
+ val drop = "drop"
+ val alter = "alter"
+ val index = "index"
+ val lock = "lock"
+ val all = "all"
+ val read = "read"
+ val write = "write"
+ val use = "use"
+}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/resources/policies_base.json b/extensions/spark/kyuubi-spark-authz/src/test/resources/policies_base.json
new file mode 100644
index 00000000000..aea5d2a9c28
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-authz/src/test/resources/policies_base.json
@@ -0,0 +1,1678 @@
+{
+ "serviceName": "hive_jenkins",
+ "serviceId": 1,
+ "policyVersion": 85,
+ "policyUpdateTime": "20190429-21:36:09.000-+0800",
+ "policies": [
+ {
+ "service": "hive_jenkins",
+ "name": "all - url",
+ "policyType": 0,
+ "policyPriority": 0,
+ "description": "Policy for all - url",
+ "isAuditEnabled": true,
+ "resources": {
+ "url": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": true
+ }
+ },
+ "policyItems": [
+ {
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ },
+ {
+ "type": "update",
+ "isAllowed": true
+ },
+ {
+ "type": "create",
+ "isAllowed": true
+ },
+ {
+ "type": "drop",
+ "isAllowed": true
+ },
+ {
+ "type": "alter",
+ "isAllowed": true
+ },
+ {
+ "type": "index",
+ "isAllowed": true
+ },
+ {
+ "type": "lock",
+ "isAllowed": true
+ },
+ {
+ "type": "all",
+ "isAllowed": true
+ },
+ {
+ "type": "read",
+ "isAllowed": true
+ },
+ {
+ "type": "write",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "admin"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": true
+ }
+ ],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [],
+ "id": 1,
+ "guid": "cf7e6725-492f-434f-bffe-6bb4e3147246",
+ "isEnabled": true,
+ "version": 1
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "all - database, table, column",
+ "policyType": 0,
+ "policyPriority": 0,
+ "description": "Policy for all - database, table, column",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [
+ {
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ },
+ {
+ "type": "update",
+ "isAllowed": true
+ },
+ {
+ "type": "create",
+ "isAllowed": true
+ },
+ {
+ "type": "drop",
+ "isAllowed": true
+ },
+ {
+ "type": "alter",
+ "isAllowed": true
+ },
+ {
+ "type": "index",
+ "isAllowed": true
+ },
+ {
+ "type": "lock",
+ "isAllowed": true
+ },
+ {
+ "type": "all",
+ "isAllowed": true
+ },
+ {
+ "type": "read",
+ "isAllowed": true
+ },
+ {
+ "type": "write",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "admin"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": true
+ }
+ ],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [],
+ "id": 2,
+ "guid": "3b96138a-af4d-48bc-9544-58c5bfa1979b",
+ "isEnabled": true,
+ "version": 1
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "all - database, udf",
+ "policyType": 0,
+ "policyPriority": 0,
+ "description": "Policy for all - database, udf",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "udf": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [
+ {
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ },
+ {
+ "type": "update",
+ "isAllowed": true
+ },
+ {
+ "type": "create",
+ "isAllowed": true
+ },
+ {
+ "type": "drop",
+ "isAllowed": true
+ },
+ {
+ "type": "alter",
+ "isAllowed": true
+ },
+ {
+ "type": "index",
+ "isAllowed": true
+ },
+ {
+ "type": "lock",
+ "isAllowed": true
+ },
+ {
+ "type": "all",
+ "isAllowed": true
+ },
+ {
+ "type": "read",
+ "isAllowed": true
+ },
+ {
+ "type": "write",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "admin"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": true
+ }
+ ],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [],
+ "id": 3,
+ "guid": "db08fbb0-61da-4f33-8144-ccd89816151d",
+ "isEnabled": true,
+ "version": 1
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "default",
+ "policyType": 0,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default",
+ "spark_catalog",
+ "iceberg_ns",
+ "ns1"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [
+ {
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ },
+ {
+ "type": "update",
+ "isAllowed": true
+ },
+ {
+ "type": "create",
+ "isAllowed": true
+ },
+ {
+ "type": "drop",
+ "isAllowed": true
+ },
+ {
+ "type": "alter",
+ "isAllowed": true
+ },
+ {
+ "type": "index",
+ "isAllowed": true
+ },
+ {
+ "type": "lock",
+ "isAllowed": true
+ },
+ {
+ "type": "all",
+ "isAllowed": true
+ },
+ {
+ "type": "read",
+ "isAllowed": true
+ },
+ {
+ "type": "write",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "bob",
+ "perm_view_user",
+ "{OWNER}"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ },
+ {
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": false
+ },
+ {
+ "type": "update",
+ "isAllowed": false
+ },
+ {
+ "type": "create",
+ "isAllowed": true
+ },
+ {
+ "type": "drop",
+ "isAllowed": false
+ },
+ {
+ "type": "alter",
+ "isAllowed": false
+ },
+ {
+ "type": "index",
+ "isAllowed": false
+ },
+ {
+ "type": "lock",
+ "isAllowed": false
+ },
+ {
+ "type": "all",
+ "isAllowed": false
+ },
+ {
+ "type": "read",
+ "isAllowed": false
+ },
+ {
+ "type": "write",
+ "isAllowed": false
+ }
+ ],
+ "users": [
+ "default_table_owner",
+ "create_only_user"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 5,
+ "guid": "2db6099d-e4f1-41df-9d24-f2f47bed618e",
+ "isEnabled": true,
+ "version": 5
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "default_kent",
+ "policyType": 0,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default",
+ "spark_catalog"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "key"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "src"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [
+ {
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ },
+ {
+ "type": "update",
+ "isAllowed": true
+ },
+ {
+ "type": "create",
+ "isAllowed": true
+ },
+ {
+ "type": "drop",
+ "isAllowed": true
+ },
+ {
+ "type": "alter",
+ "isAllowed": true
+ },
+ {
+ "type": "index",
+ "isAllowed": true
+ },
+ {
+ "type": "lock",
+ "isAllowed": true
+ },
+ {
+ "type": "all",
+ "isAllowed": true
+ },
+ {
+ "type": "read",
+ "isAllowed": true
+ },
+ {
+ "type": "write",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "kent"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 5,
+ "guid": "fd24db19-f7cc-4e13-a8ba-bbd5a07a2d8d",
+ "isEnabled": true,
+ "version": 5
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "src_key _less_than_20",
+ "policyType": 2,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "src"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [
+ {
+ "rowFilterInfo": {
+ "filterExpr": "key\u003c20"
+ },
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "bob"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "serviceType": "hive",
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 4,
+ "guid": "f588a9ed-f7b1-48f7-9d0d-c12cf2b9b7ed",
+ "isEnabled": true,
+ "version": 26
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "src_key_less_than_20_perm_view",
+ "policyType": 2,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "perm_view"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [
+ {
+ "rowFilterInfo": {
+ "filterExpr": "key\u003c20"
+ },
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "perm_view_user"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "serviceType": "hive",
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 22,
+ "guid": "c240a7ea-9d26-4db2-b925-d5dbe49bd447 \n",
+ "isEnabled": true,
+ "version": 26
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "default_bob_use",
+ "policyType": 0,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default_bob",
+ "spark_catalog"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "table_use*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [
+ {
+ "accesses": [
+ {
+ "type": "update",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "bob"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 5,
+ "guid": "2eb6099d-e4f1-41df-9d24-f2f47bed618e",
+ "isEnabled": true,
+ "version": 5
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "default_bob_select",
+ "policyType": 0,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default_bob",
+ "spark_catalog"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "table_select*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [
+ {
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ },
+ {
+ "type": "use",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "bob"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 5,
+ "guid": "2fb6099d-e4f1-41df-9d24-f2f47bed618e",
+ "isEnabled": true,
+ "version": 5
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "src_value_hash_perm_view",
+ "policyType": 1,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default",
+ "spark_catalog"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "value1"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "src"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [
+ {
+ "dataMaskInfo": {
+ "dataMaskType": "MASK_HASH"
+ },
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "bob"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 5,
+ "guid": "ed1868a1-bf79-4721-a3d5-6815cc7d4986",
+ "isEnabled": true,
+ "version": 1
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "src_value_hash",
+ "policyType": 1,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default",
+ "spark_catalog"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "value1"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "perm_view"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [
+ {
+ "dataMaskInfo": {
+ "dataMaskType": "MASK_HASH"
+ },
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "perm_view_user"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 20,
+ "guid": "bfeddeab-50d0-4902-985f-42559efa39c3",
+ "isEnabled": true,
+ "version": 1
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "src_value2_nullify",
+ "policyType": 1,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default",
+ "spark_catalog",
+ "iceberg_ns",
+ "ns1"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "value2"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "src"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [
+ {
+ "dataMaskInfo": {
+ "dataMaskType": "MASK"
+ },
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "bob"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 6,
+ "guid": "98a04cd7-8d14-4466-adc9-126d87a3af69",
+ "isEnabled": true,
+ "version": 1
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "src_value3_sf4",
+ "policyType": 1,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default",
+ "spark_catalog"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "value3"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "src"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [
+ {
+ "dataMaskInfo": {
+ "dataMaskType": "MASK_SHOW_FIRST_4"
+ },
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "bob"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 7,
+ "guid": "9d50a525-b24c-4cf5-a885-d10d426368d1",
+ "isEnabled": true,
+ "version": 1
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "src_value4_sf4",
+ "policyType": 1,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default",
+ "spark_catalog"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "value4"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "src"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [
+ {
+ "dataMaskInfo": {
+ "dataMaskType": "MASK_DATE_SHOW_YEAR"
+ },
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "bob"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 8,
+ "guid": "9d50a526-b24c-4cf5-a885-d10d426368d1",
+ "isEnabled": true,
+ "version": 1
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "src_value5_show_last_4",
+ "policyType": 1,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default",
+ "spark_catalog"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "value5"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "src"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [
+ {
+ "dataMaskInfo": {
+ "dataMaskType": "MASK_SHOW_LAST_4"
+ },
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "bob"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 32,
+ "guid": "b3f1f1e0-2bd6-4b20-8a32-a531006ae151",
+ "isEnabled": true,
+ "version": 1
+ },
+ {
+ "service": "hive_jenkins",
+ "name": "someone_access_perm_view",
+ "policyType": 0,
+ "policyPriority": 0,
+ "description": "",
+ "isAuditEnabled": true,
+ "resources": {
+ "database": {
+ "values": [
+ "default"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "column": {
+ "values": [
+ "*"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ },
+ "table": {
+ "values": [
+ "perm_view"
+ ],
+ "isExcludes": false,
+ "isRecursive": false
+ }
+ },
+ "policyItems": [
+ {
+ "accesses": [
+ {
+ "type": "select",
+ "isAllowed": true
+ }
+ ],
+ "users": [
+ "user_perm_view_only"
+ ],
+ "groups": [],
+ "conditions": [],
+ "delegateAdmin": false
+ }
+ ],
+ "denyPolicyItems": [],
+ "allowExceptions": [],
+ "denyExceptions": [],
+ "dataMaskPolicyItems": [],
+ "rowFilterPolicyItems": [],
+ "options": {},
+ "validitySchedules": [],
+ "policyLabels": [
+ ""
+ ],
+ "id": 123,
+ "guid": "2fb6099d-e421-41df-9d24-f2f47bed618e",
+ "isEnabled": true,
+ "version": 5
+ }
+ ],
+ "serviceDef": {
+ "name": "hive",
+ "implClass": "org.apache.ranger.services.hive.RangerServiceHive",
+ "label": "Hive Server2",
+ "description": "Hive Server2",
+ "options": {
+ "enableDenyAndExceptionsInPolicies": "true"
+ },
+ "configs": [
+ {
+ "itemId": 1,
+ "name": "username",
+ "type": "string",
+ "mandatory": true,
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "",
+ "label": "Username"
+ },
+ {
+ "itemId": 2,
+ "name": "password",
+ "type": "password",
+ "mandatory": true,
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "",
+ "label": "Password"
+ },
+ {
+ "itemId": 3,
+ "name": "jdbc.driverClassName",
+ "type": "string",
+ "mandatory": true,
+ "defaultValue": "org.apache.hive.jdbc.HiveDriver",
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": ""
+ },
+ {
+ "itemId": 4,
+ "name": "jdbc.url",
+ "type": "string",
+ "mandatory": true,
+ "defaultValue": "",
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "{\"TextFieldWithIcon\":true, \"info\": \"1.For Remote Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;:\u0026lt;port\u0026gt;\u003cbr\u003e2.For Embedded Mode (no host or port), eg.\u003cbr\u003ejdbc:hive2:///;initFile\u003d\u0026lt;file\u0026gt;\u003cbr\u003e3.For HTTP Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;:\u0026lt;port\u0026gt;/;\u003cbr\u003etransportMode\u003dhttp;httpPath\u003d\u0026lt;httpPath\u0026gt;\u003cbr\u003e4.For SSL Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;:\u0026lt;port\u0026gt;/;ssl\u003dtrue;\u003cbr\u003esslTrustStore\u003dtStore;trustStorePassword\u003dpw\u003cbr\u003e5.For ZooKeeper Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;/;serviceDiscoveryMode\u003d\u003cbr\u003ezooKeeper;zooKeeperNamespace\u003dhiveserver2\u003cbr\u003e6.For Kerberos Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;:\u0026lt;port\u0026gt;/;\u003cbr\u003eprincipal\u003dhive/domain@EXAMPLE.COM\u003cbr\u003e\"}"
+ },
+ {
+ "itemId": 5,
+ "name": "commonNameForCertificate",
+ "type": "string",
+ "mandatory": false,
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "",
+ "label": "Common Name for Certificate"
+ }
+ ],
+ "resources": [
+ {
+ "itemId": 1,
+ "name": "database",
+ "type": "string",
+ "level": 10,
+ "mandatory": true,
+ "lookupSupported": true,
+ "recursiveSupported": false,
+ "excludesSupported": true,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "true",
+ "ignoreCase": "true"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "",
+ "label": "Hive Database",
+ "description": "Hive Database",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": false
+ },
+ {
+ "itemId": 5,
+ "name": "url",
+ "type": "string",
+ "level": 10,
+ "mandatory": true,
+ "lookupSupported": false,
+ "recursiveSupported": true,
+ "excludesSupported": false,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerPathResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "true",
+ "ignoreCase": "false"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "",
+ "label": "URL",
+ "description": "URL",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": true
+ },
+ {
+ "itemId": 2,
+ "name": "table",
+ "type": "string",
+ "level": 20,
+ "parent": "database",
+ "mandatory": true,
+ "lookupSupported": true,
+ "recursiveSupported": false,
+ "excludesSupported": true,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "true",
+ "ignoreCase": "true"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "",
+ "label": "Hive Table",
+ "description": "Hive Table",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": false
+ },
+ {
+ "itemId": 3,
+ "name": "udf",
+ "type": "string",
+ "level": 20,
+ "parent": "database",
+ "mandatory": true,
+ "lookupSupported": true,
+ "recursiveSupported": false,
+ "excludesSupported": true,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "true",
+ "ignoreCase": "true"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "",
+ "label": "Hive UDF",
+ "description": "Hive UDF",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": true
+ },
+ {
+ "itemId": 4,
+ "name": "column",
+ "type": "string",
+ "level": 30,
+ "parent": "table",
+ "mandatory": true,
+ "lookupSupported": true,
+ "recursiveSupported": false,
+ "excludesSupported": true,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "true",
+ "ignoreCase": "true"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "",
+ "label": "Hive Column",
+ "description": "Hive Column",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": true
+ }
+ ],
+ "accessTypes": [
+ {
+ "itemId": 1,
+ "name": "select",
+ "label": "select",
+ "impliedGrants": []
+ },
+ {
+ "itemId": 2,
+ "name": "update",
+ "label": "update",
+ "impliedGrants": []
+ },
+ {
+ "itemId": 3,
+ "name": "create",
+ "label": "Create",
+ "impliedGrants": []
+ },
+ {
+ "itemId": 4,
+ "name": "drop",
+ "label": "Drop",
+ "impliedGrants": []
+ },
+ {
+ "itemId": 5,
+ "name": "alter",
+ "label": "Alter",
+ "impliedGrants": []
+ },
+ {
+ "itemId": 6,
+ "name": "index",
+ "label": "Index",
+ "impliedGrants": []
+ },
+ {
+ "itemId": 7,
+ "name": "lock",
+ "label": "Lock",
+ "impliedGrants": []
+ },
+ {
+ "itemId": 8,
+ "name": "all",
+ "label": "All",
+ "impliedGrants": [
+ "select",
+ "update",
+ "create",
+ "drop",
+ "alter",
+ "index",
+ "lock",
+ "read",
+ "write"
+ ]
+ },
+ {
+ "itemId": 9,
+ "name": "read",
+ "label": "Read",
+ "impliedGrants": []
+ },
+ {
+ "itemId": 10,
+ "name": "write",
+ "label": "Write",
+ "impliedGrants": []
+ }
+ ],
+ "policyConditions": [],
+ "contextEnrichers": [],
+ "enums": [],
+ "dataMaskDef": {
+ "maskTypes": [
+ {
+ "itemId": 1,
+ "name": "MASK",
+ "label": "Redact",
+ "description": "Replace lowercase with \u0027x\u0027, uppercase with \u0027X\u0027, digits with \u00270\u0027",
+ "transformer": "mask({col})",
+ "dataMaskOptions": {}
+ },
+ {
+ "itemId": 2,
+ "name": "MASK_SHOW_LAST_4",
+ "label": "Partial mask: show last 4",
+ "description": "Show last 4 characters; replace rest with \u0027x\u0027",
+ "transformer": "mask_show_last_n({col}, 4, \u0027x\u0027, \u0027x\u0027, \u0027x\u0027, -1, \u00271\u0027)",
+ "dataMaskOptions": {}
+ },
+ {
+ "itemId": 3,
+ "name": "MASK_SHOW_FIRST_4",
+ "label": "Partial mask: show first 4",
+ "description": "Show first 4 characters; replace rest with \u0027x\u0027",
+ "transformer": "mask_show_first_n({col}, 4, \u0027x\u0027, \u0027x\u0027, \u0027x\u0027, -1, \u00271\u0027)",
+ "dataMaskOptions": {}
+ },
+ {
+ "itemId": 4,
+ "name": "MASK_HASH",
+ "label": "Hash",
+ "description": "Hash the value",
+ "transformer": "mask_hash({col})",
+ "dataMaskOptions": {}
+ },
+ {
+ "itemId": 5,
+ "name": "MASK_NULL",
+ "label": "Nullify",
+ "description": "Replace with NULL",
+ "dataMaskOptions": {}
+ },
+ {
+ "itemId": 6,
+ "name": "MASK_NONE",
+ "label": "Unmasked (retain original value)",
+ "description": "No masking",
+ "dataMaskOptions": {}
+ },
+ {
+ "itemId": 12,
+ "name": "MASK_DATE_SHOW_YEAR",
+ "label": "Date: show only year",
+ "description": "Date: show only year",
+ "transformer": "mask({col}, \u0027x\u0027, \u0027x\u0027, \u0027x\u0027, -1, \u00271\u0027, 1, 0, -1)",
+ "dataMaskOptions": {}
+ },
+ {
+ "itemId": 13,
+ "name": "CUSTOM",
+ "label": "Custom",
+ "description": "Custom",
+ "dataMaskOptions": {}
+ }
+ ],
+ "accessTypes": [
+ {
+ "itemId": 1,
+ "name": "select",
+ "label": "select",
+ "impliedGrants": []
+ }
+ ],
+ "resources": [
+ {
+ "itemId": 1,
+ "name": "database",
+ "type": "string",
+ "level": 10,
+ "mandatory": true,
+ "lookupSupported": true,
+ "recursiveSupported": false,
+ "excludesSupported": false,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "false",
+ "ignoreCase": "true"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "{ \"singleValue\":true }",
+ "label": "Hive Database",
+ "description": "Hive Database",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": false
+ },
+ {
+ "itemId": 2,
+ "name": "table",
+ "type": "string",
+ "level": 20,
+ "parent": "database",
+ "mandatory": true,
+ "lookupSupported": true,
+ "recursiveSupported": false,
+ "excludesSupported": false,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "false",
+ "ignoreCase": "true"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "{ \"singleValue\":true }",
+ "label": "Hive Table",
+ "description": "Hive Table",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": false
+ },
+ {
+ "itemId": 4,
+ "name": "column",
+ "type": "string",
+ "level": 30,
+ "parent": "table",
+ "mandatory": true,
+ "lookupSupported": true,
+ "recursiveSupported": false,
+ "excludesSupported": false,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "false",
+ "ignoreCase": "true"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "{ \"singleValue\":true }",
+ "label": "Hive Column",
+ "description": "Hive Column",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": true
+ }
+ ]
+ },
+ "rowFilterDef": {
+ "accessTypes": [
+ {
+ "itemId": 1,
+ "name": "select",
+ "label": "select",
+ "impliedGrants": []
+ }
+ ],
+ "resources": [
+ {
+ "itemId": 1,
+ "name": "database",
+ "type": "string",
+ "level": 10,
+ "mandatory": true,
+ "lookupSupported": true,
+ "recursiveSupported": false,
+ "excludesSupported": false,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "false",
+ "ignoreCase": "true"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "{ \"singleValue\":true }",
+ "label": "Hive Database",
+ "description": "Hive Database",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": false
+ },
+ {
+ "itemId": 2,
+ "name": "table",
+ "type": "string",
+ "level": 20,
+ "parent": "database",
+ "mandatory": true,
+ "lookupSupported": true,
+ "recursiveSupported": false,
+ "excludesSupported": false,
+ "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions": {
+ "wildCard": "false",
+ "ignoreCase": "true"
+ },
+ "validationRegEx": "",
+ "validationMessage": "",
+ "uiHint": "{ \"singleValue\":true }",
+ "label": "Hive Table",
+ "description": "Hive Table",
+ "accessTypeRestrictions": [],
+ "isValidLeaf": true
+ }
+ ]
+ },
+ "id": 3,
+ "guid": "3e1afb5a-184a-4e82-9d9c-87a5cacc243c",
+ "isEnabled": true,
+ "createTime": "20190401-20:14:36.000-+0800",
+ "updateTime": "20190401-20:14:36.000-+0800",
+ "version": 1
+ },
+ "auditMode": "audit-default"
+}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/resources/sparkSql_hive_jenkins.json b/extensions/spark/kyuubi-spark-authz/src/test/resources/sparkSql_hive_jenkins.json
index 250df2ddc59..6c160d3216a 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/resources/sparkSql_hive_jenkins.json
+++ b/extensions/spark/kyuubi-spark-authz/src/test/resources/sparkSql_hive_jenkins.json
@@ -1,1675 +1,1353 @@
{
- "serviceName": "hive_jenkins",
- "serviceId": 1,
- "policyVersion": 85,
- "policyUpdateTime": "20190429-21:36:09.000-+0800",
- "policies": [
- {
- "service": "hive_jenkins",
- "name": "all - url",
- "policyType": 0,
- "policyPriority": 0,
- "description": "Policy for all - url",
- "isAuditEnabled": true,
- "resources": {
- "url": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": true
- }
- },
- "policyItems": [
- {
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- },
- {
- "type": "update",
- "isAllowed": true
- },
- {
- "type": "create",
- "isAllowed": true
- },
- {
- "type": "drop",
- "isAllowed": true
- },
- {
- "type": "alter",
- "isAllowed": true
- },
- {
- "type": "index",
- "isAllowed": true
- },
- {
- "type": "lock",
- "isAllowed": true
- },
- {
- "type": "all",
- "isAllowed": true
- },
- {
- "type": "read",
- "isAllowed": true
- },
- {
- "type": "write",
- "isAllowed": true
- }
- ],
- "users": [
- "admin"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": true
- }
- ],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [],
- "id": 1,
- "guid": "cf7e6725-492f-434f-bffe-6bb4e3147246",
- "isEnabled": true,
- "version": 1
- },
- {
- "service": "hive_jenkins",
- "name": "all - database, table, column",
- "policyType": 0,
- "policyPriority": 0,
- "description": "Policy for all - database, table, column",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
- },
- "policyItems": [
- {
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- },
- {
- "type": "update",
- "isAllowed": true
- },
- {
- "type": "create",
- "isAllowed": true
- },
- {
- "type": "drop",
- "isAllowed": true
- },
- {
- "type": "alter",
- "isAllowed": true
- },
- {
- "type": "index",
- "isAllowed": true
- },
- {
- "type": "lock",
- "isAllowed": true
- },
- {
- "type": "all",
- "isAllowed": true
- },
- {
- "type": "read",
- "isAllowed": true
- },
- {
- "type": "write",
- "isAllowed": true
- }
- ],
- "users": [
- "admin"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": true
- }
- ],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [],
- "id": 2,
- "guid": "3b96138a-af4d-48bc-9544-58c5bfa1979b",
- "isEnabled": true,
- "version": 1
+ "serviceName" : "hive_jenkins",
+ "serviceId" : 1,
+ "policyVersion" : 85,
+ "policyUpdateTime" : "20190429-21:36:09.000-+0800",
+ "policies" : [ {
+ "id" : 0,
+ "guid" : "cfcd2084-95d5-35ef-a6e7-dff9f98764da",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "all - url",
+ "policyType" : 0,
+ "policyPriority" : 0,
+ "description" : "Policy for all - url",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "url" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : true
+ }
},
- {
- "service": "hive_jenkins",
- "name": "all - database, udf",
- "policyType": 0,
- "policyPriority": 0,
- "description": "Policy for all - database, udf",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "udf": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "conditions" : [ ],
+ "policyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ }, {
+ "type" : "update",
+ "isAllowed" : true
+ }, {
+ "type" : "create",
+ "isAllowed" : true
+ }, {
+ "type" : "drop",
+ "isAllowed" : true
+ }, {
+ "type" : "alter",
+ "isAllowed" : true
+ }, {
+ "type" : "index",
+ "isAllowed" : true
+ }, {
+ "type" : "lock",
+ "isAllowed" : true
+ }, {
+ "type" : "all",
+ "isAllowed" : true
+ }, {
+ "type" : "read",
+ "isAllowed" : true
+ }, {
+ "type" : "write",
+ "isAllowed" : true
+ } ],
+ "users" : [ "admin" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ } ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 1,
+ "guid" : "c4ca4238-a0b9-3382-8dcc-509a6f75849b",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "all - database, table, column",
+ "policyType" : 0,
+ "policyPriority" : 0,
+ "description" : "Policy for all - database, table, column",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [
- {
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- },
- {
- "type": "update",
- "isAllowed": true
- },
- {
- "type": "create",
- "isAllowed": true
- },
- {
- "type": "drop",
- "isAllowed": true
- },
- {
- "type": "alter",
- "isAllowed": true
- },
- {
- "type": "index",
- "isAllowed": true
- },
- {
- "type": "lock",
- "isAllowed": true
- },
- {
- "type": "all",
- "isAllowed": true
- },
- {
- "type": "read",
- "isAllowed": true
- },
- {
- "type": "write",
- "isAllowed": true
- }
- ],
- "users": [
- "admin"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": true
- }
- ],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [],
- "id": 3,
- "guid": "db08fbb0-61da-4f33-8144-ccd89816151d",
- "isEnabled": true,
- "version": 1
- },
- {
- "service": "hive_jenkins",
- "name": "default",
- "policyType": 0,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default",
- "spark_catalog",
- "iceberg_ns",
- "ns1"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "column" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [
- {
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- },
- {
- "type": "update",
- "isAllowed": true
- },
- {
- "type": "create",
- "isAllowed": true
- },
- {
- "type": "drop",
- "isAllowed": true
- },
- {
- "type": "alter",
- "isAllowed": true
- },
- {
- "type": "index",
- "isAllowed": true
- },
- {
- "type": "lock",
- "isAllowed": true
- },
- {
- "type": "all",
- "isAllowed": true
- },
- {
- "type": "read",
- "isAllowed": true
- },
- {
- "type": "write",
- "isAllowed": true
- }
- ],
- "users": [
- "bob",
- "perm_view_user",
- "{OWNER}"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }, {
- "accesses": [
- {
- "type": "select",
- "isAllowed": false
- },
- {
- "type": "update",
- "isAllowed": false
- },
- {
- "type": "create",
- "isAllowed": true
- },
- {
- "type": "drop",
- "isAllowed": false
- },
- {
- "type": "alter",
- "isAllowed": false
- },
- {
- "type": "index",
- "isAllowed": false
- },
- {
- "type": "lock",
- "isAllowed": false
- },
- {
- "type": "all",
- "isAllowed": false
- },
- {
- "type": "read",
- "isAllowed": false
- },
- {
- "type": "write",
- "isAllowed": false
- }
- ],
- "users": [
- "default_table_owner",
- "create_only_user"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 5,
- "guid": "2db6099d-e4f1-41df-9d24-f2f47bed618e",
- "isEnabled": true,
- "version": 5
+ "table" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
},
- {
- "service": "hive_jenkins",
- "name": "default_kent",
- "policyType": 0,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default",
- "spark_catalog"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "key"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "src"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "conditions" : [ ],
+ "policyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ }, {
+ "type" : "update",
+ "isAllowed" : true
+ }, {
+ "type" : "create",
+ "isAllowed" : true
+ }, {
+ "type" : "drop",
+ "isAllowed" : true
+ }, {
+ "type" : "alter",
+ "isAllowed" : true
+ }, {
+ "type" : "index",
+ "isAllowed" : true
+ }, {
+ "type" : "lock",
+ "isAllowed" : true
+ }, {
+ "type" : "all",
+ "isAllowed" : true
+ }, {
+ "type" : "read",
+ "isAllowed" : true
+ }, {
+ "type" : "write",
+ "isAllowed" : true
+ } ],
+ "users" : [ "admin" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ } ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 2,
+ "guid" : "c81e728d-9d4c-3f63-af06-7f89cc14862c",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "all - database, udf",
+ "policyType" : 0,
+ "policyPriority" : 0,
+ "description" : "Policy for all - database, udf",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [
- {
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- },
- {
- "type": "update",
- "isAllowed": true
- },
- {
- "type": "create",
- "isAllowed": true
- },
- {
- "type": "drop",
- "isAllowed": true
- },
- {
- "type": "alter",
- "isAllowed": true
- },
- {
- "type": "index",
- "isAllowed": true
- },
- {
- "type": "lock",
- "isAllowed": true
- },
- {
- "type": "all",
- "isAllowed": true
- },
- {
- "type": "read",
- "isAllowed": true
- },
- {
- "type": "write",
- "isAllowed": true
- }
- ],
- "users": [
- "kent"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 5,
- "guid": "fd24db19-f7cc-4e13-a8ba-bbd5a07a2d8d",
- "isEnabled": true,
- "version": 5
+ "udf" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
},
- {
- "service": "hive_jenkins",
- "name": "src_key _less_than_20",
- "policyType": 2,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "src"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "conditions" : [ ],
+ "policyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ }, {
+ "type" : "update",
+ "isAllowed" : true
+ }, {
+ "type" : "create",
+ "isAllowed" : true
+ }, {
+ "type" : "drop",
+ "isAllowed" : true
+ }, {
+ "type" : "alter",
+ "isAllowed" : true
+ }, {
+ "type" : "index",
+ "isAllowed" : true
+ }, {
+ "type" : "lock",
+ "isAllowed" : true
+ }, {
+ "type" : "all",
+ "isAllowed" : true
+ }, {
+ "type" : "read",
+ "isAllowed" : true
+ }, {
+ "type" : "write",
+ "isAllowed" : true
+ } ],
+ "users" : [ "admin" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ } ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 3,
+ "guid" : "eccbc87e-4b5c-32fe-a830-8fd9f2a7baf3",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "all - database, udf",
+ "policyType" : 0,
+ "policyPriority" : 0,
+ "description" : "Policy for all - database, udf",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default", "spark_catalog", "iceberg_ns", "ns1" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [
- {
- "rowFilterInfo": {
- "filterExpr": "key\u003c20"
- },
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- }
- ],
- "users": [
- "bob"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "serviceType": "hive",
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 4,
- "guid": "f588a9ed-f7b1-48f7-9d0d-c12cf2b9b7ed",
- "isEnabled": true,
- "version": 26
- },{
- "service": "hive_jenkins",
- "name": "src_key_less_than_20_perm_view",
- "policyType": 2,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "perm_view"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "column" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [
- {
- "rowFilterInfo": {
- "filterExpr": "key\u003c20"
- },
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- }
- ],
- "users": [
- "perm_view_user"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "serviceType": "hive",
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 22,
- "guid": "c240a7ea-9d26-4db2-b925-d5dbe49bd447 \n",
- "isEnabled": true,
- "version": 26
+ "table" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
},
- {
- "service": "hive_jenkins",
- "name": "default_bob_use",
- "policyType": 0,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default_bob",
- "spark_catalog"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "table_use*"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "conditions" : [ ],
+ "policyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ }, {
+ "type" : "update",
+ "isAllowed" : true
+ }, {
+ "type" : "create",
+ "isAllowed" : true
+ }, {
+ "type" : "drop",
+ "isAllowed" : true
+ }, {
+ "type" : "alter",
+ "isAllowed" : true
+ }, {
+ "type" : "index",
+ "isAllowed" : true
+ }, {
+ "type" : "lock",
+ "isAllowed" : true
+ }, {
+ "type" : "all",
+ "isAllowed" : true
+ }, {
+ "type" : "read",
+ "isAllowed" : true
+ }, {
+ "type" : "write",
+ "isAllowed" : true
+ } ],
+ "users" : [ "bob", "perm_view_user", "{OWNER}" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ }, {
+ "accesses" : [ {
+ "type" : "create",
+ "isAllowed" : true
+ } ],
+ "users" : [ "default_table_owner", "create_only_user" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ } ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 4,
+ "guid" : "a87ff679-a2f3-371d-9181-a67b7542122c",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "default_kent",
+ "policyType" : 0,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default", "spark_catalog" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [
- {
- "accesses": [
- {
- "type": "update",
- "isAllowed": true
- }
- ],
- "users": [
- "bob"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 5,
- "guid": "2eb6099d-e4f1-41df-9d24-f2f47bed618e",
- "isEnabled": true,
- "version": 5
- },
- {
- "service": "hive_jenkins",
- "name": "default_bob_select",
- "policyType": 0,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default_bob",
- "spark_catalog"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "table_select*"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "column" : {
+ "values" : [ "key" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [
- {
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- },
- {
- "type": "use",
- "isAllowed": true
- }
- ],
- "users": [
- "bob"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 5,
- "guid": "2fb6099d-e4f1-41df-9d24-f2f47bed618e",
- "isEnabled": true,
- "version": 5
+ "table" : {
+ "values" : [ "src" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
},
- {
- "service": "hive_jenkins",
- "name": "src_value_hash_perm_view",
- "policyType": 1,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default",
- "spark_catalog"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "value1"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "src"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "conditions" : [ ],
+ "policyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ }, {
+ "type" : "update",
+ "isAllowed" : true
+ }, {
+ "type" : "create",
+ "isAllowed" : true
+ }, {
+ "type" : "drop",
+ "isAllowed" : true
+ }, {
+ "type" : "alter",
+ "isAllowed" : true
+ }, {
+ "type" : "index",
+ "isAllowed" : true
+ }, {
+ "type" : "lock",
+ "isAllowed" : true
+ }, {
+ "type" : "all",
+ "isAllowed" : true
+ }, {
+ "type" : "read",
+ "isAllowed" : true
+ }, {
+ "type" : "write",
+ "isAllowed" : true
+ } ],
+ "users" : [ "kent" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ }, {
+ "accesses" : [ {
+ "type" : "create",
+ "isAllowed" : true
+ } ],
+ "users" : [ "default_table_owner", "create_only_user" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ } ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 5,
+ "guid" : "e4da3b7f-bbce-3345-9777-2b0674a318d5",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "default_bob_use",
+ "policyType" : 0,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default_bob", "spark_catalog" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [
- {
- "dataMaskInfo": {
- "dataMaskType": "MASK_HASH"
- },
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- }
- ],
- "users": [
- "bob"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 5,
- "guid": "ed1868a1-bf79-4721-a3d5-6815cc7d4986",
- "isEnabled": true,
- "version": 1
- },{
- "service": "hive_jenkins",
- "name": "src_value_hash",
- "policyType": 1,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default",
- "spark_catalog"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "value1"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "perm_view"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "column" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [
- {
- "dataMaskInfo": {
- "dataMaskType": "MASK_HASH"
- },
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- }
- ],
- "users": [
- "perm_view_user"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 20,
- "guid": "bfeddeab-50d0-4902-985f-42559efa39c3",
- "isEnabled": true,
- "version": 1
+ "table" : {
+ "values" : [ "table_use*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
},
- {
- "service": "hive_jenkins",
- "name": "src_value2_nullify",
- "policyType": 1,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default",
- "spark_catalog",
- "iceberg_ns",
- "ns1"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "value2"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "src"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "conditions" : [ ],
+ "policyItems" : [ {
+ "accesses" : [ {
+ "type" : "update",
+ "isAllowed" : true
+ } ],
+ "users" : [ "bob" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ } ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 6,
+ "guid" : "1679091c-5a88-3faf-afb5-e6087eb1b2dc",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "default_bob_select",
+ "policyType" : 0,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default_bob", "spark_catalog" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [
- {
- "dataMaskInfo": {
- "dataMaskType": "MASK"
- },
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- }
- ],
- "users": [
- "bob"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 6,
- "guid": "98a04cd7-8d14-4466-adc9-126d87a3af69",
- "isEnabled": true,
- "version": 1
- },
- {
- "service": "hive_jenkins",
- "name": "src_value3_sf4",
- "policyType": 1,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default",
- "spark_catalog"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "value3"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "src"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "column" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [
- {
- "dataMaskInfo": {
- "dataMaskType": "MASK_SHOW_FIRST_4"
- },
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- }
- ],
- "users": [
- "bob"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 7,
- "guid": "9d50a525-b24c-4cf5-a885-d10d426368d1",
- "isEnabled": true,
- "version": 1
+ "table" : {
+ "values" : [ "table_select*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
},
- {
- "service": "hive_jenkins",
- "name": "src_value4_sf4",
- "policyType": 1,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default",
- "spark_catalog"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "value4"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "src"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "conditions" : [ ],
+ "policyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ }, {
+ "type" : "use",
+ "isAllowed" : true
+ } ],
+ "users" : [ "bob" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ } ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 7,
+ "guid" : "8f14e45f-ceea-367a-9a36-dedd4bea2543",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "someone_access_perm_view",
+ "policyType" : 0,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [
- {
- "dataMaskInfo": {
- "dataMaskType": "MASK_DATE_SHOW_YEAR"
- },
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- }
- ],
- "users": [
- "bob"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 8,
- "guid": "9d50a526-b24c-4cf5-a885-d10d426368d1",
- "isEnabled": true,
- "version": 1
+ "column" : {
+ "values" : [ "*" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ },
+ "table" : {
+ "values" : [ "perm_view" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
},
- {
- "service": "hive_jenkins",
- "name": "src_value5_show_last_4",
- "policyType": 1,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default",
- "spark_catalog"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "value5"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "src"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "conditions" : [ ],
+ "policyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ } ],
+ "users" : [ "user_perm_view_only" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true
+ } ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 8,
+ "guid" : "c9f0f895-fb98-3b91-99f5-1fd0297e236d",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "src_key_less_than_20",
+ "policyType" : 2,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [
- {
- "dataMaskInfo": {
- "dataMaskType": "MASK_SHOW_LAST_4"
- },
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- }
- ],
- "users": [
- "bob"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 32,
- "guid": "b3f1f1e0-2bd6-4b20-8a32-a531006ae151",
- "isEnabled": true,
- "version": 1
+ "table" : {
+ "values" : [ "src" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
},
- {
- "service": "hive_jenkins",
- "name": "someone_access_perm_view",
- "policyType": 0,
- "policyPriority": 0,
- "description": "",
- "isAuditEnabled": true,
- "resources": {
- "database": {
- "values": [
- "default"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "column": {
- "values": [
- "*"
- ],
- "isExcludes": false,
- "isRecursive": false
- },
- "table": {
- "values": [
- "perm_view"
- ],
- "isExcludes": false,
- "isRecursive": false
- }
+ "conditions" : [ ],
+ "policyItems" : [ ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ } ],
+ "users" : [ "bob", "perm_view_user" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : false,
+ "rowFilterInfo" : {
+ "filterExpr" : "key<20"
+ }
+ } ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 9,
+ "guid" : "45c48cce-2e2d-3fbd-aa1a-fc51c7c6ad26",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "perm_view_key_less_than_20",
+ "policyType" : 2,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- "policyItems": [
- {
- "accesses": [
- {
- "type": "select",
- "isAllowed": true
- }
- ],
- "users": [
- "user_perm_view_only"
- ],
- "groups": [],
- "conditions": [],
- "delegateAdmin": false
- }
- ],
- "denyPolicyItems": [],
- "allowExceptions": [],
- "denyExceptions": [],
- "dataMaskPolicyItems": [],
- "rowFilterPolicyItems": [],
- "options": {},
- "validitySchedules": [],
- "policyLabels": [
- ""
- ],
- "id": 123,
- "guid": "2fb6099d-e421-41df-9d24-f2f47bed618e",
- "isEnabled": true,
- "version": 5
- }
- ],
- "serviceDef": {
- "name": "hive",
- "implClass": "org.apache.ranger.services.hive.RangerServiceHive",
- "label": "Hive Server2",
- "description": "Hive Server2",
- "options": {
- "enableDenyAndExceptionsInPolicies": "true"
+ "table" : {
+ "values" : [ "perm_view" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
},
- "configs": [
- {
- "itemId": 1,
- "name": "username",
- "type": "string",
- "mandatory": true,
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "",
- "label": "Username"
+ "conditions" : [ ],
+ "policyItems" : [ ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ ],
+ "rowFilterPolicyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ } ],
+ "users" : [ "perm_view_user" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : false,
+ "rowFilterInfo" : {
+ "filterExpr" : "key<20"
+ }
+ } ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 10,
+ "guid" : "d3d94468-02a4-3259-b55d-38e6d163e820",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "src_value_hash_perm_view",
+ "policyType" : 1,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default", "spark_catalog" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 2,
- "name": "password",
- "type": "password",
- "mandatory": true,
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "",
- "label": "Password"
+ "column" : {
+ "values" : [ "value1" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 3,
- "name": "jdbc.driverClassName",
- "type": "string",
- "mandatory": true,
- "defaultValue": "org.apache.hive.jdbc.HiveDriver",
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": ""
+ "table" : {
+ "values" : [ "src" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
+ },
+ "conditions" : [ ],
+ "policyItems" : [ ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ } ],
+ "users" : [ "bob" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true,
+ "dataMaskInfo" : {
+ "dataMaskType" : "MASK_HASH"
+ }
+ } ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 11,
+ "guid" : "6512bd43-d9ca-36e0-ac99-0b0a82652dca",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "src_value_hash",
+ "policyType" : 1,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default", "spark_catalog" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 4,
- "name": "jdbc.url",
- "type": "string",
- "mandatory": true,
- "defaultValue": "",
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "{\"TextFieldWithIcon\":true, \"info\": \"1.For Remote Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;:\u0026lt;port\u0026gt;\u003cbr\u003e2.For Embedded Mode (no host or port), eg.\u003cbr\u003ejdbc:hive2:///;initFile\u003d\u0026lt;file\u0026gt;\u003cbr\u003e3.For HTTP Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;:\u0026lt;port\u0026gt;/;\u003cbr\u003etransportMode\u003dhttp;httpPath\u003d\u0026lt;httpPath\u0026gt;\u003cbr\u003e4.For SSL Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;:\u0026lt;port\u0026gt;/;ssl\u003dtrue;\u003cbr\u003esslTrustStore\u003dtStore;trustStorePassword\u003dpw\u003cbr\u003e5.For ZooKeeper Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;/;serviceDiscoveryMode\u003d\u003cbr\u003ezooKeeper;zooKeeperNamespace\u003dhiveserver2\u003cbr\u003e6.For Kerberos Mode, eg.\u003cbr\u003ejdbc:hive2://\u0026lt;host\u0026gt;:\u0026lt;port\u0026gt;/;\u003cbr\u003eprincipal\u003dhive/domain@EXAMPLE.COM\u003cbr\u003e\"}"
+ "column" : {
+ "values" : [ "value1" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 5,
- "name": "commonNameForCertificate",
- "type": "string",
- "mandatory": false,
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "",
- "label": "Common Name for Certificate"
+ "table" : {
+ "values" : [ "perm_view" ],
+ "isExcludes" : false,
+ "isRecursive" : false
}
- ],
- "resources": [
- {
- "itemId": 1,
- "name": "database",
- "type": "string",
- "level": 10,
- "mandatory": true,
- "lookupSupported": true,
- "recursiveSupported": false,
- "excludesSupported": true,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
- "matcherOptions": {
- "wildCard": "true",
- "ignoreCase": "true"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "",
- "label": "Hive Database",
- "description": "Hive Database",
- "accessTypeRestrictions": [],
- "isValidLeaf": false
+ },
+ "conditions" : [ ],
+ "policyItems" : [ ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ } ],
+ "users" : [ "perm_view_user" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true,
+ "dataMaskInfo" : {
+ "dataMaskType" : "MASK_HASH"
+ }
+ } ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 12,
+ "guid" : "c20ad4d7-6fe9-3759-aa27-a0c99bff6710",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "src_value2_nullify",
+ "policyType" : 1,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default", "spark_catalog", "iceberg_ns", "ns1" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 5,
- "name": "url",
- "type": "string",
- "level": 10,
- "mandatory": true,
- "lookupSupported": false,
- "recursiveSupported": true,
- "excludesSupported": false,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerPathResourceMatcher",
- "matcherOptions": {
- "wildCard": "true",
- "ignoreCase": "false"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "",
- "label": "URL",
- "description": "URL",
- "accessTypeRestrictions": [],
- "isValidLeaf": true
+ "column" : {
+ "values" : [ "value2" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 2,
- "name": "table",
- "type": "string",
- "level": 20,
- "parent": "database",
- "mandatory": true,
- "lookupSupported": true,
- "recursiveSupported": false,
- "excludesSupported": true,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
- "matcherOptions": {
- "wildCard": "true",
- "ignoreCase": "true"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "",
- "label": "Hive Table",
- "description": "Hive Table",
- "accessTypeRestrictions": [],
- "isValidLeaf": false
+ "table" : {
+ "values" : [ "src" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
+ },
+ "conditions" : [ ],
+ "policyItems" : [ ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ } ],
+ "users" : [ "bob" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true,
+ "dataMaskInfo" : {
+ "dataMaskType" : "MASK"
+ }
+ } ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 13,
+ "guid" : "c51ce410-c124-310e-8db5-e4b97fc2af39",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "src_value3_sf4",
+ "policyType" : 1,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default", "spark_catalog" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 3,
- "name": "udf",
- "type": "string",
- "level": 20,
- "parent": "database",
- "mandatory": true,
- "lookupSupported": true,
- "recursiveSupported": false,
- "excludesSupported": true,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
- "matcherOptions": {
- "wildCard": "true",
- "ignoreCase": "true"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "",
- "label": "Hive UDF",
- "description": "Hive UDF",
- "accessTypeRestrictions": [],
- "isValidLeaf": true
+ "column" : {
+ "values" : [ "value3" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 4,
- "name": "column",
- "type": "string",
- "level": 30,
- "parent": "table",
- "mandatory": true,
- "lookupSupported": true,
- "recursiveSupported": false,
- "excludesSupported": true,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
- "matcherOptions": {
- "wildCard": "true",
- "ignoreCase": "true"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "",
- "label": "Hive Column",
- "description": "Hive Column",
- "accessTypeRestrictions": [],
- "isValidLeaf": true
+ "table" : {
+ "values" : [ "src" ],
+ "isExcludes" : false,
+ "isRecursive" : false
}
- ],
- "accessTypes": [
- {
- "itemId": 1,
- "name": "select",
- "label": "select",
- "impliedGrants": []
+ },
+ "conditions" : [ ],
+ "policyItems" : [ ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ } ],
+ "users" : [ "bob" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true,
+ "dataMaskInfo" : {
+ "dataMaskType" : "MASK_SHOW_FIRST_4"
+ }
+ } ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 14,
+ "guid" : "aab32389-22bc-325a-af60-6eb525ffdc56",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "src_value4_sf4",
+ "policyType" : 1,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default", "spark_catalog" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 2,
- "name": "update",
- "label": "update",
- "impliedGrants": []
+ "column" : {
+ "values" : [ "value4" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 3,
- "name": "create",
- "label": "Create",
- "impliedGrants": []
+ "table" : {
+ "values" : [ "src" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
+ },
+ "conditions" : [ ],
+ "policyItems" : [ ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ } ],
+ "users" : [ "bob" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true,
+ "dataMaskInfo" : {
+ "dataMaskType" : "MASK_DATE_SHOW_YEAR"
+ }
+ } ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ }, {
+ "id" : 15,
+ "guid" : "9bf31c7f-f062-336a-96d3-c8bd1f8f2ff3",
+ "isEnabled" : true,
+ "version" : 1,
+ "service" : "hive_jenkins",
+ "name" : "src_value5_sf4",
+ "policyType" : 1,
+ "policyPriority" : 0,
+ "description" : "",
+ "isAuditEnabled" : true,
+ "resources" : {
+ "database" : {
+ "values" : [ "default", "spark_catalog" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 4,
- "name": "drop",
- "label": "Drop",
- "impliedGrants": []
+ "column" : {
+ "values" : [ "value5" ],
+ "isExcludes" : false,
+ "isRecursive" : false
},
- {
- "itemId": 5,
- "name": "alter",
- "label": "Alter",
- "impliedGrants": []
+ "table" : {
+ "values" : [ "src" ],
+ "isExcludes" : false,
+ "isRecursive" : false
+ }
+ },
+ "conditions" : [ ],
+ "policyItems" : [ ],
+ "denyPolicyItems" : [ ],
+ "allowExceptions" : [ ],
+ "denyExceptions" : [ ],
+ "dataMaskPolicyItems" : [ {
+ "accesses" : [ {
+ "type" : "select",
+ "isAllowed" : true
+ } ],
+ "users" : [ "bob" ],
+ "groups" : [ ],
+ "roles" : [ ],
+ "conditions" : [ ],
+ "delegateAdmin" : true,
+ "dataMaskInfo" : {
+ "dataMaskType" : "MASK_SHOW_LAST_4"
+ }
+ } ],
+ "rowFilterPolicyItems" : [ ],
+ "options" : { },
+ "validitySchedules" : [ ],
+ "policyLabels" : [ ],
+ "isDenyAllElse" : false
+ } ],
+ "serviceDef" : {
+ "name" : "hive",
+ "implClass" : "org.apache.ranger.services.hive.RangerServiceHive",
+ "label" : "Hive Server2",
+ "description" : "Hive Server2",
+ "options" : {
+ "enableDenyAndExceptionsInPolicies" : "true"
+ },
+ "configs" : [ {
+ "itemId" : 1,
+ "name" : "username",
+ "type" : "string",
+ "mandatory" : true,
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "",
+ "label" : "Username"
+ }, {
+ "itemId" : 2,
+ "name" : "password",
+ "type" : "password",
+ "mandatory" : true,
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "",
+ "label" : "Password"
+ }, {
+ "itemId" : 3,
+ "name" : "jdbc.driverClassName",
+ "type" : "string",
+ "mandatory" : true,
+ "defaultValue" : "org.apache.hive.jdbc.HiveDriver",
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : ""
+ }, {
+ "itemId" : 4,
+ "name" : "jdbc.url",
+ "type" : "string",
+ "mandatory" : true,
+ "defaultValue" : "",
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "{\"TextFieldWithIcon\":true, \"info\": \"1.For Remote Mode, eg. jdbc:hive2://<host>:<port> 2.For Embedded Mode (no host or port), eg. jdbc:hive2:///;initFile=<file> 3.For HTTP Mode, eg. jdbc:hive2://<host>:<port>/; transportMode=http;httpPath=<httpPath> 4.For SSL Mode, eg. jdbc:hive2://<host>:<port>/;ssl=true; sslTrustStore=tStore;trustStorePassword=pw 5.For ZooKeeper Mode, eg. jdbc:hive2://<host>/;serviceDiscoveryMode= zooKeeper;zooKeeperNamespace=hiveserver2 6.For Kerberos Mode, eg. jdbc:hive2://<host>:<port>/; principal=hive/domain@EXAMPLE.COM \"}"
+ }, {
+ "itemId" : 5,
+ "name" : "commonNameForCertificate",
+ "type" : "string",
+ "mandatory" : false,
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "",
+ "label" : "Common Name for Certificate"
+ } ],
+ "resources" : [ {
+ "itemId" : 1,
+ "name" : "database",
+ "type" : "string",
+ "level" : 10,
+ "mandatory" : true,
+ "lookupSupported" : true,
+ "recursiveSupported" : false,
+ "excludesSupported" : true,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "true",
+ "ignoreCase" : "true"
},
- {
- "itemId": 6,
- "name": "index",
- "label": "Index",
- "impliedGrants": []
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "",
+ "label" : "Hive Database",
+ "description" : "Hive Database",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : false
+ }, {
+ "itemId" : 5,
+ "name" : "url",
+ "type" : "string",
+ "level" : 10,
+ "mandatory" : true,
+ "lookupSupported" : false,
+ "recursiveSupported" : true,
+ "excludesSupported" : false,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerPathResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "true",
+ "ignoreCase" : "false"
},
- {
- "itemId": 7,
- "name": "lock",
- "label": "Lock",
- "impliedGrants": []
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "",
+ "label" : "URL",
+ "description" : "URL",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : true
+ }, {
+ "itemId" : 2,
+ "name" : "table",
+ "type" : "string",
+ "level" : 20,
+ "parent" : "database",
+ "mandatory" : true,
+ "lookupSupported" : true,
+ "recursiveSupported" : false,
+ "excludesSupported" : true,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "true",
+ "ignoreCase" : "true"
},
- {
- "itemId": 8,
- "name": "all",
- "label": "All",
- "impliedGrants": [
- "select",
- "update",
- "create",
- "drop",
- "alter",
- "index",
- "lock",
- "read",
- "write"
- ]
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "",
+ "label" : "Hive Table",
+ "description" : "Hive Table",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : false
+ }, {
+ "itemId" : 3,
+ "name" : "udf",
+ "type" : "string",
+ "level" : 20,
+ "parent" : "database",
+ "mandatory" : true,
+ "lookupSupported" : true,
+ "recursiveSupported" : false,
+ "excludesSupported" : true,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "true",
+ "ignoreCase" : "true"
},
- {
- "itemId": 9,
- "name": "read",
- "label": "Read",
- "impliedGrants": []
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "",
+ "label" : "Hive UDF",
+ "description" : "Hive UDF",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : true
+ }, {
+ "itemId" : 4,
+ "name" : "column",
+ "type" : "string",
+ "level" : 30,
+ "parent" : "table",
+ "mandatory" : true,
+ "lookupSupported" : true,
+ "recursiveSupported" : false,
+ "excludesSupported" : true,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "true",
+ "ignoreCase" : "true"
},
- {
- "itemId": 10,
- "name": "write",
- "label": "Write",
- "impliedGrants": []
- }
- ],
- "policyConditions": [],
- "contextEnrichers": [],
- "enums": [],
- "dataMaskDef": {
- "maskTypes": [
- {
- "itemId": 1,
- "name": "MASK",
- "label": "Redact",
- "description": "Replace lowercase with \u0027x\u0027, uppercase with \u0027X\u0027, digits with \u00270\u0027",
- "transformer": "mask({col})",
- "dataMaskOptions": {}
- },
- {
- "itemId": 2,
- "name": "MASK_SHOW_LAST_4",
- "label": "Partial mask: show last 4",
- "description": "Show last 4 characters; replace rest with \u0027x\u0027",
- "transformer": "mask_show_last_n({col}, 4, \u0027x\u0027, \u0027x\u0027, \u0027x\u0027, -1, \u00271\u0027)",
- "dataMaskOptions": {}
- },
- {
- "itemId": 3,
- "name": "MASK_SHOW_FIRST_4",
- "label": "Partial mask: show first 4",
- "description": "Show first 4 characters; replace rest with \u0027x\u0027",
- "transformer": "mask_show_first_n({col}, 4, \u0027x\u0027, \u0027x\u0027, \u0027x\u0027, -1, \u00271\u0027)",
- "dataMaskOptions": {}
- },
- {
- "itemId": 4,
- "name": "MASK_HASH",
- "label": "Hash",
- "description": "Hash the value",
- "transformer": "mask_hash({col})",
- "dataMaskOptions": {}
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "",
+ "label" : "Hive Column",
+ "description" : "Hive Column",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : true
+ } ],
+ "accessTypes" : [ {
+ "itemId" : 1,
+ "name" : "select",
+ "label" : "select",
+ "impliedGrants" : [ ]
+ }, {
+ "itemId" : 2,
+ "name" : "update",
+ "label" : "update",
+ "impliedGrants" : [ ]
+ }, {
+ "itemId" : 3,
+ "name" : "create",
+ "label" : "Create",
+ "impliedGrants" : [ ]
+ }, {
+ "itemId" : 4,
+ "name" : "drop",
+ "label" : "Drop",
+ "impliedGrants" : [ ]
+ }, {
+ "itemId" : 5,
+ "name" : "alter",
+ "label" : "Alter",
+ "impliedGrants" : [ ]
+ }, {
+ "itemId" : 6,
+ "name" : "index",
+ "label" : "Index",
+ "impliedGrants" : [ ]
+ }, {
+ "itemId" : 7,
+ "name" : "lock",
+ "label" : "Lock",
+ "impliedGrants" : [ ]
+ }, {
+ "itemId" : 8,
+ "name" : "all",
+ "label" : "All",
+ "impliedGrants" : [ "select", "update", "create", "drop", "alter", "index", "lock", "read", "write" ]
+ }, {
+ "itemId" : 9,
+ "name" : "read",
+ "label" : "Read",
+ "impliedGrants" : [ ]
+ }, {
+ "itemId" : 10,
+ "name" : "write",
+ "label" : "Write",
+ "impliedGrants" : [ ]
+ } ],
+ "policyConditions" : [ ],
+ "contextEnrichers" : [ ],
+ "enums" : [ ],
+ "dataMaskDef" : {
+ "maskTypes" : [ {
+ "itemId" : 1,
+ "name" : "MASK",
+ "label" : "Redact",
+ "description" : "Replace lowercase with 'x', uppercase with 'X', digits with '0'",
+ "transformer" : "mask({col})",
+ "dataMaskOptions" : { }
+ }, {
+ "itemId" : 2,
+ "name" : "MASK_SHOW_LAST_4",
+ "label" : "Partial mask: show last 4",
+ "description" : "Show last 4 characters; replace rest with 'x'",
+ "transformer" : "mask_show_last_n({col}, 4, 'x', 'x', 'x', -1, '1')",
+ "dataMaskOptions" : { }
+ }, {
+ "itemId" : 3,
+ "name" : "MASK_SHOW_FIRST_4",
+ "label" : "Partial mask: show first 4",
+ "description" : "Show first 4 characters; replace rest with 'x'",
+ "transformer" : "mask_show_first_n({col}, 4, 'x', 'x', 'x', -1, '1')",
+ "dataMaskOptions" : { }
+ }, {
+ "itemId" : 4,
+ "name" : "MASK_HASH",
+ "label" : "Hash",
+ "description" : "Hash the value",
+ "transformer" : "mask_hash({col})",
+ "dataMaskOptions" : { }
+ }, {
+ "itemId" : 5,
+ "name" : "MASK_NULL",
+ "label" : "Nullify",
+ "description" : "Replace with NULL",
+ "dataMaskOptions" : { }
+ }, {
+ "itemId" : 6,
+ "name" : "MASK_NONE",
+ "label" : "Unmasked (retain original value)",
+ "description" : "No masking",
+ "dataMaskOptions" : { }
+ }, {
+ "itemId" : 12,
+ "name" : "MASK_DATE_SHOW_YEAR",
+ "label" : "Date: show only year",
+ "description" : "Date: show only year",
+ "transformer" : "mask({col}, 'x', 'x', 'x', -1, '1', 1, 0, -1)",
+ "dataMaskOptions" : { }
+ }, {
+ "itemId" : 13,
+ "name" : "CUSTOM",
+ "label" : "Custom",
+ "description" : "Custom",
+ "dataMaskOptions" : { }
+ } ],
+ "accessTypes" : [ {
+ "itemId" : 1,
+ "name" : "select",
+ "label" : "select",
+ "impliedGrants" : [ ]
+ } ],
+ "resources" : [ {
+ "itemId" : 1,
+ "name" : "database",
+ "type" : "string",
+ "level" : 10,
+ "mandatory" : true,
+ "lookupSupported" : true,
+ "recursiveSupported" : false,
+ "excludesSupported" : false,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "false",
+ "ignoreCase" : "true"
},
- {
- "itemId": 5,
- "name": "MASK_NULL",
- "label": "Nullify",
- "description": "Replace with NULL",
- "dataMaskOptions": {}
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "{ \"singleValue\":true }",
+ "label" : "Hive Database",
+ "description" : "Hive Database",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : false
+ }, {
+ "itemId" : 2,
+ "name" : "table",
+ "type" : "string",
+ "level" : 20,
+ "parent" : "database",
+ "mandatory" : true,
+ "lookupSupported" : true,
+ "recursiveSupported" : false,
+ "excludesSupported" : false,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "false",
+ "ignoreCase" : "true"
},
- {
- "itemId": 6,
- "name": "MASK_NONE",
- "label": "Unmasked (retain original value)",
- "description": "No masking",
- "dataMaskOptions": {}
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "{ \"singleValue\":true }",
+ "label" : "Hive Table",
+ "description" : "Hive Table",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : false
+ }, {
+ "itemId" : 4,
+ "name" : "column",
+ "type" : "string",
+ "level" : 30,
+ "parent" : "table",
+ "mandatory" : true,
+ "lookupSupported" : true,
+ "recursiveSupported" : false,
+ "excludesSupported" : false,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "false",
+ "ignoreCase" : "true"
},
- {
- "itemId": 12,
- "name": "MASK_DATE_SHOW_YEAR",
- "label": "Date: show only year",
- "description": "Date: show only year",
- "transformer": "mask({col}, \u0027x\u0027, \u0027x\u0027, \u0027x\u0027, -1, \u00271\u0027, 1, 0, -1)",
- "dataMaskOptions": {}
- },
- {
- "itemId": 13,
- "name": "CUSTOM",
- "label": "Custom",
- "description": "Custom",
- "dataMaskOptions": {}
- }
- ],
- "accessTypes": [
- {
- "itemId": 1,
- "name": "select",
- "label": "select",
- "impliedGrants": []
- }
- ],
- "resources": [
- {
- "itemId": 1,
- "name": "database",
- "type": "string",
- "level": 10,
- "mandatory": true,
- "lookupSupported": true,
- "recursiveSupported": false,
- "excludesSupported": false,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
- "matcherOptions": {
- "wildCard": "false",
- "ignoreCase": "true"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "{ \"singleValue\":true }",
- "label": "Hive Database",
- "description": "Hive Database",
- "accessTypeRestrictions": [],
- "isValidLeaf": false
- },
- {
- "itemId": 2,
- "name": "table",
- "type": "string",
- "level": 20,
- "parent": "database",
- "mandatory": true,
- "lookupSupported": true,
- "recursiveSupported": false,
- "excludesSupported": false,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
- "matcherOptions": {
- "wildCard": "false",
- "ignoreCase": "true"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "{ \"singleValue\":true }",
- "label": "Hive Table",
- "description": "Hive Table",
- "accessTypeRestrictions": [],
- "isValidLeaf": false
- },
- {
- "itemId": 4,
- "name": "column",
- "type": "string",
- "level": 30,
- "parent": "table",
- "mandatory": true,
- "lookupSupported": true,
- "recursiveSupported": false,
- "excludesSupported": false,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
- "matcherOptions": {
- "wildCard": "false",
- "ignoreCase": "true"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "{ \"singleValue\":true }",
- "label": "Hive Column",
- "description": "Hive Column",
- "accessTypeRestrictions": [],
- "isValidLeaf": true
- }
- ]
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "{ \"singleValue\":true }",
+ "label" : "Hive Column",
+ "description" : "Hive Column",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : true
+ } ]
},
- "rowFilterDef": {
- "accessTypes": [
- {
- "itemId": 1,
- "name": "select",
- "label": "select",
- "impliedGrants": []
- }
- ],
- "resources": [
- {
- "itemId": 1,
- "name": "database",
- "type": "string",
- "level": 10,
- "mandatory": true,
- "lookupSupported": true,
- "recursiveSupported": false,
- "excludesSupported": false,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
- "matcherOptions": {
- "wildCard": "false",
- "ignoreCase": "true"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "{ \"singleValue\":true }",
- "label": "Hive Database",
- "description": "Hive Database",
- "accessTypeRestrictions": [],
- "isValidLeaf": false
+ "rowFilterDef" : {
+ "accessTypes" : [ {
+ "itemId" : 1,
+ "name" : "select",
+ "label" : "select",
+ "impliedGrants" : [ ]
+ } ],
+ "resources" : [ {
+ "itemId" : 1,
+ "name" : "database",
+ "type" : "string",
+ "level" : 10,
+ "mandatory" : true,
+ "lookupSupported" : true,
+ "recursiveSupported" : false,
+ "excludesSupported" : false,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "false",
+ "ignoreCase" : "true"
+ },
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "{ \"singleValue\":true }",
+ "label" : "Hive Database",
+ "description" : "Hive Database",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : false
+ }, {
+ "itemId" : 2,
+ "name" : "table",
+ "type" : "string",
+ "level" : 20,
+ "parent" : "database",
+ "mandatory" : true,
+ "lookupSupported" : true,
+ "recursiveSupported" : false,
+ "excludesSupported" : false,
+ "matcher" : "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
+ "matcherOptions" : {
+ "wildCard" : "false",
+ "ignoreCase" : "true"
},
- {
- "itemId": 2,
- "name": "table",
- "type": "string",
- "level": 20,
- "parent": "database",
- "mandatory": true,
- "lookupSupported": true,
- "recursiveSupported": false,
- "excludesSupported": false,
- "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher",
- "matcherOptions": {
- "wildCard": "false",
- "ignoreCase": "true"
- },
- "validationRegEx": "",
- "validationMessage": "",
- "uiHint": "{ \"singleValue\":true }",
- "label": "Hive Table",
- "description": "Hive Table",
- "accessTypeRestrictions": [],
- "isValidLeaf": true
- }
- ]
+ "validationRegEx" : "",
+ "validationMessage" : "",
+ "uiHint" : "{ \"singleValue\":true }",
+ "label" : "Hive Table",
+ "description" : "Hive Table",
+ "accessTypeRestrictions" : [ ],
+ "isValidLeaf" : true
+ } ]
},
- "id": 3,
- "guid": "3e1afb5a-184a-4e82-9d9c-87a5cacc243c",
- "isEnabled": true,
- "createTime": "20190401-20:14:36.000-+0800",
- "updateTime": "20190401-20:14:36.000-+0800",
- "version": 1
+ "id" : 3,
+ "guid" : "3e1afb5a-184a-4e82-9d9c-87a5cacc243c",
+ "isEnabled" : true,
+ "createTime" : "20190401-20:14:36.000-+0800",
+ "updateTime" : "20190401-20:14:36.000-+0800",
+ "version" : 1
},
- "auditMode": "audit-default"
-}
+ "auditMode" : "audit-default"
+}
\ No newline at end of file
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/FunctionPrivilegesBuilderSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/FunctionPrivilegesBuilderSuite.scala
new file mode 100644
index 00000000000..ad4b57faa93
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/FunctionPrivilegesBuilderSuite.scala
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.spark.authz
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+// scalastyle:off
+import org.scalatest.funsuite.AnyFunSuite
+
+import org.apache.kyuubi.plugin.spark.authz.OperationType.QUERY
+import org.apache.kyuubi.plugin.spark.authz.ranger.AccessType
+
+abstract class FunctionPrivilegesBuilderSuite extends AnyFunSuite
+ with SparkSessionProvider with BeforeAndAfterAll with BeforeAndAfterEach {
+ // scalastyle:on
+
+ protected def withTable(t: String)(f: String => Unit): Unit = {
+ try {
+ f(t)
+ } finally {
+ sql(s"DROP TABLE IF EXISTS $t")
+ }
+ }
+
+ protected def withDatabase(t: String)(f: String => Unit): Unit = {
+ try {
+ f(t)
+ } finally {
+ sql(s"DROP DATABASE IF EXISTS $t")
+ }
+ }
+
+ protected def checkColumns(plan: LogicalPlan, cols: Seq[String]): Unit = {
+ val (in, out, _) = PrivilegesBuilder.build(plan, spark)
+ assert(out.isEmpty, "Queries shall not check output privileges")
+ val po = in.head
+ assert(po.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
+ assert(po.columns === cols)
+ }
+
+ protected def checkColumns(query: String, cols: Seq[String]): Unit = {
+ checkColumns(sql(query).queryExecution.optimizedPlan, cols)
+ }
+
+ protected val reusedDb: String = getClass.getSimpleName
+ protected val reusedDb2: String = getClass.getSimpleName + "2"
+ protected val reusedTable: String = reusedDb + "." + getClass.getSimpleName
+ protected val reusedTableShort: String = reusedTable.split("\\.").last
+ protected val reusedPartTable: String = reusedTable + "_part"
+ protected val reusedPartTableShort: String = reusedPartTable.split("\\.").last
+ protected val functionCount = 3
+ protected val functionNamePrefix = "kyuubi_fun_"
+ protected val tempFunNamePrefix = "kyuubi_temp_fun_"
+
+ override def beforeAll(): Unit = {
+ sql(s"CREATE DATABASE IF NOT EXISTS $reusedDb")
+ sql(s"CREATE DATABASE IF NOT EXISTS $reusedDb2")
+ sql(s"CREATE TABLE IF NOT EXISTS $reusedTable" +
+ s" (key int, value string) USING parquet")
+ sql(s"CREATE TABLE IF NOT EXISTS $reusedPartTable" +
+ s" (key int, value string, pid string) USING parquet" +
+ s" PARTITIONED BY(pid)")
+ // scalastyle:off
+ (0 until functionCount).foreach { index =>
+ {
+ sql(s"CREATE FUNCTION ${reusedDb}.${functionNamePrefix}${index} AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskHash'")
+ sql(s"CREATE FUNCTION ${reusedDb2}.${functionNamePrefix}${index} AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskHash'")
+ sql(s"CREATE TEMPORARY FUNCTION ${tempFunNamePrefix}${index} AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskHash'")
+ }
+ }
+ sql(s"USE ${reusedDb2}")
+ // scalastyle:on
+ super.beforeAll()
+ }
+
+ override def afterAll(): Unit = {
+ Seq(reusedTable, reusedPartTable).foreach { t =>
+ sql(s"DROP TABLE IF EXISTS $t")
+ }
+
+ Seq(reusedDb, reusedDb2).foreach { db =>
+ (0 until functionCount).foreach { index =>
+ sql(s"DROP FUNCTION ${db}.${functionNamePrefix}${index}")
+ }
+ sql(s"DROP DATABASE IF EXISTS ${db}")
+ }
+
+ spark.stop()
+ super.afterAll()
+ }
+}
+
+class HiveFunctionPrivilegesBuilderSuite extends FunctionPrivilegesBuilderSuite {
+
+ override protected val catalogImpl: String = "hive"
+
+ test("Function Call Query") {
+ val plan = sql(s"SELECT kyuubi_fun_1('data'), " +
+ s"kyuubi_fun_2(value), " +
+ s"${reusedDb}.kyuubi_fun_0(value), " +
+ s"kyuubi_temp_fun_1('data2')," +
+ s"kyuubi_temp_fun_2(key) " +
+ s"FROM $reusedTable").queryExecution.analyzed
+ val (inputs, _, _) = PrivilegesBuilder.buildFunctions(plan, spark)
+ assert(inputs.size === 3)
+ inputs.foreach { po =>
+ assert(po.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po.privilegeObjectType === PrivilegeObjectType.FUNCTION)
+ assert(po.dbname startsWith reusedDb.toLowerCase)
+ assert(po.objectName startsWith functionNamePrefix.toLowerCase)
+ val accessType = ranger.AccessType(po, QUERY, isInput = true)
+ assert(accessType === AccessType.SELECT)
+ }
+ }
+
+ test("Function Call Query with Quoted Name") {
+ val plan = sql(s"SELECT `kyuubi_fun_1`('data'), " +
+ s"`kyuubi_fun_2`(value), " +
+ s"`${reusedDb}`.`kyuubi_fun_0`(value), " +
+ s"`kyuubi_temp_fun_1`('data2')," +
+ s"`kyuubi_temp_fun_2`(key) " +
+ s"FROM $reusedTable").queryExecution.analyzed
+ val (inputs, _, _) = PrivilegesBuilder.buildFunctions(plan, spark)
+ assert(inputs.size === 3)
+ inputs.foreach { po =>
+ assert(po.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po.privilegeObjectType === PrivilegeObjectType.FUNCTION)
+ assert(po.dbname startsWith reusedDb.toLowerCase)
+ assert(po.objectName startsWith functionNamePrefix.toLowerCase)
+ val accessType = ranger.AccessType(po, QUERY, isInput = true)
+ assert(accessType === AccessType.SELECT)
+ }
+ }
+
+ test("Simple Function Call Query") {
+ val plan = sql(s"SELECT kyuubi_fun_1('data'), " +
+ s"kyuubi_fun_0('value'), " +
+ s"${reusedDb}.kyuubi_fun_0('value'), " +
+ s"${reusedDb}.kyuubi_fun_2('value'), " +
+ s"kyuubi_temp_fun_1('data2')," +
+ s"kyuubi_temp_fun_2('key') ").queryExecution.analyzed
+ val (inputs, _, _) = PrivilegesBuilder.buildFunctions(plan, spark)
+ assert(inputs.size === 4)
+ inputs.foreach { po =>
+ assert(po.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po.privilegeObjectType === PrivilegeObjectType.FUNCTION)
+ assert(po.dbname startsWith reusedDb.toLowerCase)
+ assert(po.objectName startsWith functionNamePrefix.toLowerCase)
+ val accessType = ranger.AccessType(po, QUERY, isInput = true)
+ assert(accessType === AccessType.SELECT)
+ }
+ }
+
+ test("Function Call In CAST Command") {
+ val table = "castTable"
+ withTable(table) { table =>
+ val plan = sql(s"CREATE TABLE ${table} " +
+ s"SELECT kyuubi_fun_1('data') col1, " +
+ s"${reusedDb2}.kyuubi_fun_2(value) col2, " +
+ s"kyuubi_fun_0(value) col3, " +
+ s"kyuubi_fun_2('value') col4, " +
+ s"${reusedDb}.kyuubi_fun_2('value') col5, " +
+ s"${reusedDb}.kyuubi_fun_1('value') col6, " +
+ s"kyuubi_temp_fun_1('data2') col7, " +
+ s"kyuubi_temp_fun_2(key) col8 " +
+ s"FROM ${reusedTable} WHERE ${reusedDb2}.kyuubi_fun_1(key)='123'").queryExecution.analyzed
+ val (inputs, _, _) = PrivilegesBuilder.buildFunctions(plan, spark)
+ assert(inputs.size === 7)
+ inputs.foreach { po =>
+ assert(po.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po.privilegeObjectType === PrivilegeObjectType.FUNCTION)
+ assert(po.dbname startsWith reusedDb.toLowerCase)
+ assert(po.objectName startsWith functionNamePrefix.toLowerCase)
+ val accessType = ranger.AccessType(po, QUERY, isInput = true)
+ assert(accessType === AccessType.SELECT)
+ }
+ }
+ }
+
+}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/IcebergCatalogPrivilegesBuilderSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/IcebergCatalogPrivilegesBuilderSuite.scala
index 81397038920..45186e2502d 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/IcebergCatalogPrivilegesBuilderSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/IcebergCatalogPrivilegesBuilderSuite.scala
@@ -22,7 +22,11 @@ import org.scalatest.Outcome
import org.apache.kyuubi.Utils
import org.apache.kyuubi.plugin.spark.authz.OperationType._
import org.apache.kyuubi.plugin.spark.authz.ranger.AccessType
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.tags.IcebergTest
+import org.apache.kyuubi.util.AssertionUtils._
+@IcebergTest
class IcebergCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite {
override protected val catalogImpl: String = "hive"
override protected val sqlExtensions: String =
@@ -64,8 +68,8 @@ class IcebergCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.UPDATE)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.dbname === namespace)
- assert(po.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -81,8 +85,8 @@ class IcebergCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.UPDATE)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.dbname === namespace)
- assert(po.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -104,8 +108,8 @@ class IcebergCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite {
val po0 = inputs.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname === namespace)
- assert(po0.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(namespace)(po0.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po0.objectName)
assert(po0.columns === Seq("key", "value"))
checkV2TableOwner(po0)
@@ -113,12 +117,34 @@ class IcebergCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.UPDATE)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.UPDATE)
}
}
+
+ test("RewriteDataFilesProcedure") {
+ val table = "RewriteDataFilesProcedure"
+ withV2Table(table) { tableId =>
+ sql(s"CREATE TABLE IF NOT EXISTS $tableId (key int, value String) USING iceberg")
+ sql(s"INSERT INTO $tableId VALUES (1, 'a'), (2, 'b'), (3, 'c')")
+
+ val plan = sql(s"CALL $catalogV2.system.rewrite_data_files (table => '$tableId')")
+ .queryExecution.analyzed
+ val (inputs, outputs, operationType) = PrivilegesBuilder.build(plan, spark)
+ assert(operationType === ALTERTABLE_PROPERTIES)
+ assert(inputs.size === 0)
+ assert(outputs.size === 1)
+ val po = outputs.head
+ assert(po.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
+ val accessType = AccessType(po, operationType, isInput = false)
+ assert(accessType === AccessType.ALTER)
+ }
+ }
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilderSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilderSuite.scala
index 43929091769..723fabd7b67 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilderSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilderSuite.scala
@@ -30,9 +30,11 @@ import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
import org.scalatest.funsuite.AnyFunSuite
import org.apache.kyuubi.plugin.spark.authz.OperationType._
+import org.apache.kyuubi.plugin.spark.authz.RangerTestNamespace._
+import org.apache.kyuubi.plugin.spark.authz.RangerTestUsers._
import org.apache.kyuubi.plugin.spark.authz.ranger.AccessType
-import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils
-import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils.isSparkVersionAtMost
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.AssertionUtils._
abstract class PrivilegesBuilderSuite extends AnyFunSuite
with SparkSessionProvider with BeforeAndAfterAll with BeforeAndAfterEach {
@@ -110,7 +112,7 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
}
test("AlterDatabasePropertiesCommand") {
- assume(isSparkVersionAtMost("3.2"))
+ assume(SPARK_RUNTIME_VERSION <= "3.2")
val plan = sql("ALTER DATABASE default SET DBPROPERTIES (abc = '123')").queryExecution.analyzed
val (in, out, operationType) = PrivilegesBuilder.build(plan, spark)
assertResult(plan.getClass.getName)(
@@ -122,8 +124,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.isEmpty)
- assert(po.dbname === "default")
- assert(po.objectName === "default")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase(defaultDb)(po.objectName)
assert(po.columns.isEmpty)
}
@@ -147,8 +149,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
out.foreach { po =>
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(Set(oldTableShort, "efg").contains(po.objectName))
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertExistsIgnoreCase(po.objectName)(Set(oldTableShort, "efg"))
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType == AccessType.ALTER)
@@ -158,7 +160,7 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
}
test("CreateDatabaseCommand") {
- assume(isSparkVersionAtMost("3.2"))
+ assume(SPARK_RUNTIME_VERSION <= "3.2")
withDatabase("CreateDatabaseCommand") { db =>
val plan = sql(s"CREATE DATABASE $db").queryExecution.analyzed
val (in, out, operationType) = PrivilegesBuilder.build(plan, spark)
@@ -171,8 +173,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.isEmpty)
- assert(po.dbname === "CreateDatabaseCommand")
- assert(po.objectName === "CreateDatabaseCommand")
+ assertEqualsIgnoreCase(db)(po.dbname)
+ assertEqualsIgnoreCase(db)(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
@@ -180,7 +182,7 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
}
test("DropDatabaseCommand") {
- assume(isSparkVersionAtMost("3.2"))
+ assume(SPARK_RUNTIME_VERSION <= "3.2")
withDatabase("DropDatabaseCommand") { db =>
sql(s"CREATE DATABASE $db")
val plan = sql(s"DROP DATABASE DropDatabaseCommand").queryExecution.analyzed
@@ -194,8 +196,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.isEmpty)
- assert(po.dbname === "DropDatabaseCommand")
- assert(po.objectName === "DropDatabaseCommand")
+ assertEqualsIgnoreCase(db)(po.dbname)
+ assertEqualsIgnoreCase(db)(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.DROP)
@@ -212,8 +214,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName === reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po.objectName)
assert(po.columns.head === "pid")
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -230,8 +232,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName === reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po.objectName)
assert(po.columns.head === "pid")
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -263,8 +265,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName equalsIgnoreCase tableName.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(tableName.split("\\.").last)(po.objectName)
assert(po.columns.isEmpty)
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -286,8 +288,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName === reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po.objectName)
assert(po.columns.head === "pid")
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -309,8 +311,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === reusedDb)
- assert(po.objectName === reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po.objectName)
assert(po.columns.head === "pid")
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -331,8 +333,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === reusedDb)
- assert(po.objectName === reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -350,8 +352,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po0.objectName)
if (isSparkV32OrGreater) {
// Query in AlterViewAsCommand can not be resolved before SPARK-34698
assert(po0.columns === Seq("key", "value", "pid"))
@@ -365,8 +367,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === (if (isSparkV2) null else "default"))
- assert(po.objectName === "AlterViewAsCommand")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase("AlterViewAsCommand")(po.objectName)
checkTableOwner(po)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -377,41 +379,62 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val plan = sql(s"ANALYZE TABLE $reusedPartTable PARTITION (pid=1)" +
s" COMPUTE STATISTICS FOR COLUMNS key").queryExecution.analyzed
val (in, out, operationType) = PrivilegesBuilder.build(plan, spark)
- assert(operationType === ANALYZE_TABLE)
+ assert(operationType === ALTERTABLE_PROPERTIES)
assert(in.size === 1)
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po0.objectName)
// ignore this check as it behaves differently across spark versions
assert(po0.columns === Seq("key"))
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
- assert(accessType0 === AccessType.SELECT)
+ assert(accessType0 === AccessType.ALTER)
+
+ assert(out.size === 1)
+ val po1 = out.head
+ assert(po1.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po1.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
+ assertEqualsIgnoreCase(reusedDb)(po1.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po1.objectName)
+ // ignore this check as it behaves differently across spark versions
+ assert(po1.columns.isEmpty)
+ checkTableOwner(po1)
+ val accessType1 = ranger.AccessType(po1, operationType, isInput = true)
+ assert(accessType1 === AccessType.ALTER)
- assert(out.size === 0)
}
test("AnalyzePartitionCommand") {
val plan = sql(s"ANALYZE TABLE $reusedPartTable" +
s" PARTITION (pid = 1) COMPUTE STATISTICS").queryExecution.analyzed
val (in, out, operationType) = PrivilegesBuilder.build(plan, spark)
- assert(operationType === ANALYZE_TABLE)
+ assert(operationType === ALTERTABLE_PROPERTIES)
assert(in.size === 1)
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po0.objectName)
// ignore this check as it behaves differently across spark versions
assert(po0.columns === Seq("pid"))
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
- assert(accessType0 === AccessType.SELECT)
+ assert(accessType0 === AccessType.ALTER)
- assert(out.size === 0)
+ assert(out.size === 1)
+ val po1 = out.head
+ assert(po1.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po1.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
+ assertEqualsIgnoreCase(reusedDb)(po1.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po1.objectName)
+ // ignore this check as it behaves differently across spark versions
+ assert(po1.columns.isEmpty)
+ checkTableOwner(po1)
+ val accessType1 = ranger.AccessType(po1, operationType, isInput = true)
+ assert(accessType1 === AccessType.ALTER)
}
test("AnalyzeTableCommand") {
@@ -419,20 +442,30 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
.queryExecution.analyzed
val (in, out, operationType) = PrivilegesBuilder.build(plan, spark)
- assert(operationType === ANALYZE_TABLE)
+ assert(operationType === ALTERTABLE_PROPERTIES)
assert(in.size === 1)
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po0.objectName)
// ignore this check as it behaves differently across spark versions
assert(po0.columns.isEmpty)
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
- assert(accessType0 === AccessType.SELECT)
+ assert(accessType0 === AccessType.ALTER)
- assert(out.size === 0)
+ assert(out.size === 1)
+ val po1 = out.head
+ assert(po1.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po1.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
+ assertEqualsIgnoreCase(reusedDb)(po1.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po1.objectName)
+ // ignore this check as it behaves differently across spark versions
+ assert(po1.columns.isEmpty)
+ checkTableOwner(po1)
+ val accessType1 = ranger.AccessType(po1, operationType, isInput = true)
+ assert(accessType1 === AccessType.ALTER)
}
test("AnalyzeTablesCommand") {
@@ -445,8 +478,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.DATABASE)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedDb)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedDb)(po0.objectName)
// ignore this check as it behaves differently across spark versions
assert(po0.columns.isEmpty)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -463,8 +496,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedDb)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedDb)(po0.objectName)
assert(po0.columns.isEmpty)
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -482,8 +515,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
if (isSparkV32OrGreater) {
assert(po0.columns.head === "key")
checkTableOwner(po0)
@@ -505,8 +538,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
if (isSparkV32OrGreater) {
assert(po0.columns === Seq("key", "value"))
checkTableOwner(po0)
@@ -521,8 +554,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === (if (isSparkV2) null else "default"))
- assert(po.objectName === "CreateViewCommand")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase("CreateViewCommand")(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
@@ -541,8 +574,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === (if (isSparkV2) null else "default"))
- assert(po.objectName === tableName)
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase(tableName)(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
@@ -588,9 +621,9 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.FUNCTION)
assert(po.catalog.isEmpty)
- val db = if (isSparkV33OrGreater) "default" else null
- assert(po.dbname === db)
- assert(po.objectName === "CreateFunctionCommand")
+ val db = if (isSparkV33OrGreater) defaultDb else null
+ assertEqualsIgnoreCase(db)(po.dbname)
+ assertEqualsIgnoreCase("CreateFunctionCommand")(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
@@ -620,16 +653,16 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.FUNCTION)
assert(po.catalog.isEmpty)
- val db = if (isSparkV33OrGreater) "default" else null
- assert(po.dbname === db)
- assert(po.objectName === "DropFunctionCommand")
+ val db = if (isSparkV33OrGreater) defaultDb else null
+ assertEqualsIgnoreCase(db)(po.dbname)
+ assertEqualsIgnoreCase("DropFunctionCommand")(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.DROP)
}
test("RefreshFunctionCommand") {
- assume(AuthZUtils.isSparkVersionAtLeast("3.1"))
+ assume(isSparkV31OrGreater)
sql(s"CREATE FUNCTION RefreshFunctionCommand AS '${getClass.getCanonicalName}'")
val plan = sql("REFRESH FUNCTION RefreshFunctionCommand")
.queryExecution.analyzed
@@ -641,9 +674,9 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.FUNCTION)
assert(po.catalog.isEmpty)
- val db = if (isSparkV33OrGreater) "default" else null
- assert(po.dbname === db)
- assert(po.objectName === "RefreshFunctionCommand")
+ val db = if (isSparkV33OrGreater) defaultDb else null
+ assertEqualsIgnoreCase(db)(po.dbname)
+ assertEqualsIgnoreCase("RefreshFunctionCommand")(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.NONE)
@@ -658,8 +691,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
assert(po0.columns.isEmpty)
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -670,8 +703,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName === "CreateTableLikeCommand")
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase("CreateTableLikeCommand")(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
@@ -689,8 +722,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
assert(po0.columns.isEmpty)
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -701,8 +734,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName === "CreateTableLikeCommandWithoutDatabase")
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase("CreateTableLikeCommandWithoutDatabase")(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
@@ -727,8 +760,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po.objectName)
assert(po.columns === Seq("key"))
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -746,8 +779,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -757,7 +790,7 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
}
test("DescribeDatabaseCommand") {
- assume(isSparkVersionAtMost("3.2"))
+ assume(SPARK_RUNTIME_VERSION <= "3.2")
val plan = sql(s"DESC DATABASE $reusedDb").queryExecution.analyzed
val (in, out, operationType) = PrivilegesBuilder.build(plan, spark)
assert(operationType === DESCDATABASE)
@@ -766,8 +799,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName equalsIgnoreCase reusedDb)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedDb)(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.USE)
@@ -785,8 +818,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.DATABASE)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedDb)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedDb)(po0.objectName)
assert(po0.columns.isEmpty)
val accessType0 = ranger.AccessType(po0, operationType, isInput = false)
assert(accessType0 === AccessType.USE)
@@ -808,8 +841,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName === reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po.objectName)
assert(po.columns.head === "pid")
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -824,8 +857,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
assert(po0.columns.isEmpty)
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -842,8 +875,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
assert(po0.columns.isEmpty)
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -860,8 +893,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
assert(po0.columns.isEmpty)
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -879,8 +912,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedPartTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedPartTableShort)(po0.objectName)
assert(po0.columns === Seq("pid"))
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -915,8 +948,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName equalsIgnoreCase tableName.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(tableName.split("\\.").last)(po.objectName)
assert(po.columns.isEmpty)
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -931,8 +964,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName equalsIgnoreCase reusedTableShort)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po.objectName)
assert(po.columns.take(2) === Seq("key", "value"))
checkTableOwner(po)
}
@@ -956,7 +989,6 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
}
test("Query: CTE") {
- assume(!isSparkV2)
checkColumns(
s"""
|with t(c) as (select coalesce(max(key), pid, 1) from $reusedPartTable group by pid)
@@ -1007,8 +1039,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName startsWith reusedTableShort.toLowerCase)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertStartsWithIgnoreCase(reusedTableShort)(po.objectName)
assert(
po.columns === Seq("value", "pid", "key"),
s"$reusedPartTable both 'key', 'value' and 'pid' should be authenticated")
@@ -1034,8 +1066,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName startsWith reusedTableShort.toLowerCase)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertStartsWithIgnoreCase(reusedTableShort)(po.objectName)
assert(
po.columns === Seq("value", "key", "pid"),
s"$reusedPartTable both 'key', 'value' and 'pid' should be authenticated")
@@ -1064,8 +1096,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName startsWith reusedTableShort.toLowerCase)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertStartsWithIgnoreCase(reusedTableShort)(po.objectName)
assert(
po.columns === Seq("key", "value"),
s"$reusedPartTable 'key' is the join key and 'pid' is omitted")
@@ -1093,8 +1125,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName startsWith reusedTableShort.toLowerCase)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertStartsWithIgnoreCase(reusedTableShort)(po.objectName)
assert(
po.columns === Seq("key", "value"),
s"$reusedPartTable both 'key' and 'value' should be authenticated")
@@ -1123,8 +1155,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName startsWith reusedTableShort.toLowerCase)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertStartsWithIgnoreCase(reusedTableShort)(po.objectName)
assert(
po.columns === Seq("key", "value"),
s"$reusedPartTable both 'key' and 'value' should be authenticated")
@@ -1149,8 +1181,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName startsWith reusedTableShort.toLowerCase)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertStartsWithIgnoreCase(reusedTableShort)(po.objectName)
assert(
po.columns === Seq("key", "value"),
s"$reusedPartTable both 'key' and 'value' should be authenticated")
@@ -1175,8 +1207,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName startsWith reusedTableShort.toLowerCase)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertStartsWithIgnoreCase(reusedTableShort)(po.objectName)
assert(
po.columns === Seq("key", "value", "pid"),
s"$reusedPartTable both 'key', 'value' and 'pid' should be authenticated")
@@ -1219,8 +1251,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName === getClass.getSimpleName)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po.objectName)
assert(po.columns.head === "a")
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -1228,7 +1260,6 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
}
test("AlterTableChangeColumnCommand") {
- assume(!isSparkV2)
val plan = sql(s"ALTER TABLE $reusedTable" +
s" ALTER COLUMN value COMMENT 'alter column'").queryExecution.analyzed
val (in, out, operationType) = PrivilegesBuilder.build(plan, spark)
@@ -1239,8 +1270,8 @@ abstract class PrivilegesBuilderSuite extends AnyFunSuite
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName === getClass.getSimpleName)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po.objectName)
assert(po.columns.head === "value")
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -1253,7 +1284,7 @@ class InMemoryPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
// some hive version does not support set database location
test("AlterDatabaseSetLocationCommand") {
- assume(isSparkVersionAtMost("3.2"))
+ assume(SPARK_RUNTIME_VERSION <= "3.2")
val newLoc = spark.conf.get("spark.sql.warehouse.dir") + "/new_db_location"
val plan = sql(s"ALTER DATABASE default SET LOCATION '$newLoc'")
.queryExecution.analyzed
@@ -1267,8 +1298,8 @@ class InMemoryPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.isEmpty)
- assert(po.dbname === "default")
- assert(po.objectName === "default")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase(defaultDb)(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.ALTER)
@@ -1284,8 +1315,8 @@ class InMemoryPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
assert(po0.columns === Seq("key", "value"))
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -1296,8 +1327,8 @@ class InMemoryPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === (if (isSparkV2) null else "default"))
- assert(po.objectName === "CreateDataSourceTableAsSelectCommand")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase("CreateDataSourceTableAsSelectCommand")(po.objectName)
if (catalogImpl == "hive") {
assert(po.columns === Seq("key", "value"))
} else {
@@ -1310,10 +1341,9 @@ class InMemoryPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
- override protected val catalogImpl: String = if (isSparkV2) "in-memory" else "hive"
+ override protected val catalogImpl: String = "hive"
test("AlterTableSerDePropertiesCommand") {
- assume(!isSparkV2)
withTable("AlterTableSerDePropertiesCommand") { t =>
sql(s"CREATE TABLE $t (key int, pid int) USING hive PARTITIONED BY (pid)")
sql(s"ALTER TABLE $t ADD IF NOT EXISTS PARTITION (pid=1)")
@@ -1328,8 +1358,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === "default")
- assert(po.objectName === t)
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase(t)(po.objectName)
assert(po.columns.head === "pid")
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -1338,7 +1368,6 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
}
test("CreateTableCommand") {
- assume(!isSparkV2)
withTable("CreateTableCommand") { _ =>
val plan = sql(s"CREATE TABLE CreateTableCommand(a int, b string) USING hive")
.queryExecution.analyzed
@@ -1350,8 +1379,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === "default")
- assert(po.objectName === "CreateTableCommand")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase("CreateTableCommand")(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
@@ -1359,7 +1388,6 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
}
test("CreateHiveTableAsSelectCommand") {
- assume(!isSparkV2)
val plan = sql(s"CREATE TABLE CreateHiveTableAsSelectCommand USING hive" +
s" AS SELECT key, value FROM $reusedTable")
.queryExecution.analyzed
@@ -1370,8 +1398,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
assert(po0.columns === Seq("key", "value"))
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -1382,15 +1410,14 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === "default")
- assert(po.objectName === "CreateHiveTableAsSelectCommand")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase("CreateHiveTableAsSelectCommand")(po.objectName)
assert(po.columns === Seq("key", "value"))
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
}
test("LoadDataCommand") {
- assume(!isSparkV2)
val dataPath = getClass.getClassLoader.getResource("data.txt").getPath
val tableName = reusedDb + "." + "LoadDataToTable"
withTable(tableName) { _ =>
@@ -1410,7 +1437,7 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
val po0 = out.head
assert(po0.actionType === PrivilegeObjectActionType.INSERT_OVERWRITE)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
assert(po0.objectName equalsIgnoreCase tableName.split("\\.").last)
assert(po0.columns.isEmpty)
checkTableOwner(po0)
@@ -1420,7 +1447,6 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
}
test("InsertIntoDatasourceDirCommand") {
- assume(!isSparkV2)
val tableDirectory = getClass.getResource("/").getPath + "table_directory"
val directory = File(tableDirectory).createDirectory()
val plan = sql(
@@ -1435,7 +1461,7 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
assert(po0.objectName equalsIgnoreCase reusedPartTable.split("\\.").last)
assert(po0.columns === Seq("key", "value", "pid"))
checkTableOwner(po0)
@@ -1446,7 +1472,6 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
}
test("InsertIntoDataSourceCommand") {
- assume(!isSparkV2)
val tableName = "InsertIntoDataSourceTable"
withTable(tableName) { _ =>
// sql(s"CREATE TABLE $tableName (a int, b string) USING parquet")
@@ -1480,8 +1505,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po.objectName)
assert(po.columns === Seq("key", "value"))
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = true)
@@ -1493,8 +1518,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.INSERT)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase "default")
- assert(po.objectName equalsIgnoreCase tableName)
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase(tableName)(po.objectName)
assert(po.columns.isEmpty)
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -1505,7 +1530,6 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
}
test("InsertIntoHadoopFsRelationCommand") {
- assume(!isSparkV2)
val tableName = "InsertIntoHadoopFsRelationTable"
withTable(tableName) { _ =>
sql(s"CREATE TABLE $tableName (a int, b string) USING parquet")
@@ -1523,8 +1547,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName equalsIgnoreCase reusedTable.split("\\.").last)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po.objectName)
assert(po.columns === Seq("key", "value"))
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -1536,8 +1560,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.INSERT)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase "default")
- assert(po.objectName equalsIgnoreCase tableName)
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase(tableName)(po.objectName)
assert(po.columns === Seq("a", "b"))
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -1546,8 +1570,7 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
}
}
- test("InsertIntoHiveDirCommand") {
- assume(!isSparkV2)
+ test("InsertIntoDataSourceDirCommand") {
val tableDirectory = getClass.getResource("/").getPath + "table_directory"
val directory = File(tableDirectory).createDirectory()
val plan = sql(
@@ -1562,7 +1585,32 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname equalsIgnoreCase reusedDb)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assert(po0.objectName equalsIgnoreCase reusedPartTable.split("\\.").last)
+ assert(po0.columns === Seq("key", "value", "pid"))
+ checkTableOwner(po0)
+ val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
+ assert(accessType0 === AccessType.SELECT)
+
+ assert(out.isEmpty)
+ }
+
+ test("InsertIntoHiveDirCommand") {
+ val tableDirectory = getClass.getResource("/").getPath + "table_directory"
+ val directory = File(tableDirectory).createDirectory()
+ val plan = sql(
+ s"""
+ |INSERT OVERWRITE DIRECTORY '$directory.path'
+ |ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+ |SELECT * FROM $reusedPartTable""".stripMargin)
+ .queryExecution.analyzed
+ val (in, out, operationType) = PrivilegesBuilder.build(plan, spark)
+ assert(operationType === QUERY)
+ assert(in.size === 1)
+ val po0 = in.head
+ assert(po0.actionType === PrivilegeObjectActionType.OTHER)
+ assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
assert(po0.objectName equalsIgnoreCase reusedPartTable.split("\\.").last)
assert(po0.columns === Seq("key", "value", "pid"))
checkTableOwner(po0)
@@ -1573,7 +1621,6 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
}
test("InsertIntoHiveTableCommand") {
- assume(!isSparkV2)
val tableName = "InsertIntoHiveTable"
withTable(tableName) { _ =>
sql(s"CREATE TABLE $tableName (a int, b string) USING hive")
@@ -1592,8 +1639,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.INSERT)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname equalsIgnoreCase "default")
- assert(po.objectName equalsIgnoreCase tableName)
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase(tableName)(po.objectName)
assert(po.columns === Seq("a", "b"))
checkTableOwner(po)
val accessType = ranger.AccessType(po, operationType, isInput = false)
@@ -1603,7 +1650,6 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
}
test("ShowCreateTableAsSerdeCommand") {
- assume(!isSparkV2)
withTable("ShowCreateTableAsSerdeCommand") { t =>
sql(s"CREATE TABLE $t (key int, pid int) USING hive PARTITIONED BY (pid)")
val plan = sql(s"SHOW CREATE TABLE $t AS SERDE").queryExecution.analyzed
@@ -1613,8 +1659,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
val po0 = in.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.dbname === "default")
- assert(po0.objectName === t)
+ assertEqualsIgnoreCase(defaultDb)(po0.dbname)
+ assertEqualsIgnoreCase(t)(po0.objectName)
assert(po0.columns.isEmpty)
checkTableOwner(po0)
val accessType0 = ranger.AccessType(po0, operationType, isInput = true)
@@ -1625,7 +1671,6 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
}
test("OptimizedCreateHiveTableAsSelectCommand") {
- assume(!isSparkV2)
val plan = sql(
s"CREATE TABLE OptimizedCreateHiveTableAsSelectCommand STORED AS parquet AS SELECT 1 as a")
.queryExecution.analyzed
@@ -1639,8 +1684,8 @@ class HiveCatalogPrivilegeBuilderSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
assert(po.catalog.isEmpty)
- assert(po.dbname === "default")
- assert(po.objectName === "OptimizedCreateHiveTableAsSelectCommand")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase("OptimizedCreateHiveTableAsSelectCommand")(po.objectName)
assert(po.columns === Seq("a"))
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/RangerTestResources.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/RangerTestResources.scala
new file mode 100644
index 00000000000..2297f73f9c4
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/RangerTestResources.scala
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.spark.authz
+
+object RangerTestUsers {
+ // authorized users used in policy generation
+ val admin = "admin"
+ val alice = "alice"
+ val bob = "bob"
+ val kent = "kent"
+ val permViewUser = "perm_view_user"
+ val ownerPlaceHolder = "{OWNER}"
+ val createOnlyUser = "create_only_user"
+ val defaultTableOwner = "default_table_owner"
+ val permViewOnlyUser = "user_perm_view_only"
+
+ // non-authorized users
+ val invisibleUser = "i_am_invisible"
+ val denyUser = "denyuser"
+ val denyUser2 = "denyuser2"
+ val someone = "someone"
+}
+
+object RangerTestNamespace {
+ val defaultDb = "default"
+ val sparkCatalog = "spark_catalog"
+ val icebergNamespace = "iceberg_ns"
+ val namespace1 = "ns1"
+ val namespace2 = "ns2"
+}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/SparkSessionProvider.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/SparkSessionProvider.scala
index ce8d6bc0ccf..e6f70b4d1a6 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/SparkSessionProvider.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/SparkSessionProvider.scala
@@ -23,29 +23,25 @@ import java.security.PrivilegedExceptionAction
import org.apache.hadoop.security.UserGroupInformation
import org.apache.spark.SparkConf
import org.apache.spark.sql.{DataFrame, Row, SparkSession, SparkSessionExtensions}
-import org.scalatest.Assertions.convertToEqualizer
+import org.scalatest.Assertions._
import org.apache.kyuubi.Utils
+import org.apache.kyuubi.plugin.spark.authz.RangerTestUsers._
import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
trait SparkSessionProvider {
protected val catalogImpl: String
protected def format: String = if (catalogImpl == "hive") "hive" else "parquet"
- protected val isSparkV2: Boolean = isSparkVersionAtMost("2.4")
- protected val isSparkV31OrGreater: Boolean = isSparkVersionAtLeast("3.1")
- protected val isSparkV32OrGreater: Boolean = isSparkVersionAtLeast("3.2")
- protected val isSparkV33OrGreater: Boolean = isSparkVersionAtLeast("3.3")
- protected val extension: SparkSessionExtensions => Unit = _ => Unit
+ protected val extension: SparkSessionExtensions => Unit = _ => ()
protected val sqlExtensions: String = ""
- protected val defaultTableOwner = "default_table_owner"
protected val extraSparkConf: SparkConf = new SparkConf()
protected lazy val spark: SparkSession = {
val metastore = {
val path = Utils.createTempDir(prefix = "hms")
- Files.delete(path)
+ Files.deleteIfExists(path)
path
}
val ret = SparkSession.builder()
@@ -83,12 +79,12 @@ trait SparkSessionProvider {
f
} finally {
res.foreach {
- case (t, "table") => doAs("admin", sql(s"DROP TABLE IF EXISTS $t"))
- case (db, "database") => doAs("admin", sql(s"DROP DATABASE IF EXISTS $db"))
- case (fn, "function") => doAs("admin", sql(s"DROP FUNCTION IF EXISTS $fn"))
- case (view, "view") => doAs("admin", sql(s"DROP VIEW IF EXISTS $view"))
+ case (t, "table") => doAs(admin, sql(s"DROP TABLE IF EXISTS $t"))
+ case (db, "database") => doAs(admin, sql(s"DROP DATABASE IF EXISTS $db"))
+ case (fn, "function") => doAs(admin, sql(s"DROP FUNCTION IF EXISTS $fn"))
+ case (view, "view") => doAs(admin, sql(s"DROP VIEW IF EXISTS $view"))
case (cacheTable, "cache") => if (isSparkV32OrGreater) {
- doAs("admin", sql(s"UNCACHE TABLE IF EXISTS $cacheTable"))
+ doAs(admin, sql(s"UNCACHE TABLE IF EXISTS $cacheTable"))
}
case (_, e) =>
throw new RuntimeException(s"the resource whose resource type is $e cannot be cleared")
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/V2CommandsPrivilegesSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/V2CommandsPrivilegesSuite.scala
index dede8142693..3ebea1ce9d9 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/V2CommandsPrivilegesSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/V2CommandsPrivilegesSuite.scala
@@ -23,8 +23,11 @@ import org.apache.hadoop.security.UserGroupInformation
import org.apache.spark.sql.execution.QueryExecution
import org.apache.kyuubi.plugin.spark.authz.OperationType._
+import org.apache.kyuubi.plugin.spark.authz.RangerTestNamespace._
import org.apache.kyuubi.plugin.spark.authz.ranger.AccessType
import org.apache.kyuubi.plugin.spark.authz.serde.{Database, DB_COMMAND_SPECS}
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.AssertionUtils._
abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
@@ -99,9 +102,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
assert(po.owner.isEmpty)
val accessType = AccessType(po, operationType, isInput = false)
@@ -121,9 +124,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po0 = inputs.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.catalog === None)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTableShort)
+ assert(po0.catalog.isEmpty)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
assert(po0.columns.take(2) === Seq("key", "value"))
checkTableOwner(po0)
@@ -131,9 +134,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
assert(po.owner.isEmpty)
val accessType = AccessType(po, operationType, isInput = false)
@@ -154,9 +157,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
assert(po.owner.isEmpty)
val accessType = AccessType(po, operationType, isInput = false)
@@ -176,9 +179,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po0 = inputs.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.catalog === None)
- assert(po0.dbname equalsIgnoreCase reusedDb)
- assert(po0.objectName equalsIgnoreCase reusedTableShort)
+ assert(po0.catalog.isEmpty)
+ assertEqualsIgnoreCase(reusedDb)(po0.dbname)
+ assertEqualsIgnoreCase(reusedTableShort)(po0.objectName)
assert(po0.columns.take(2) === Seq("key", "value"))
checkTableOwner(po0)
@@ -186,9 +189,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
assert(po.owner.isEmpty)
val accessType = AccessType(po, operationType, isInput = false)
@@ -207,9 +210,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.INSERT)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -229,9 +232,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.UPDATE)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -249,9 +252,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.UPDATE)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -267,9 +270,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.INSERT_OVERWRITE)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -290,9 +293,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.INSERT_OVERWRITE)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogPartTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogPartTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -315,9 +318,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogPartTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogPartTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -337,9 +340,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogPartTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogPartTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -359,9 +362,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogPartTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogPartTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -382,9 +385,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogPartTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogPartTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -403,9 +406,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -425,9 +428,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -452,9 +455,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po0 = inputs.head
assert(po0.actionType === PrivilegeObjectActionType.OTHER)
assert(po0.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po0.catalog === Some(catalogV2))
- assert(po0.dbname === namespace)
- assert(po0.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po0.catalog)
+ assertEqualsIgnoreCase(namespace)(po0.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po0.objectName)
assert(po0.columns === Seq("key", "value"))
checkV2TableOwner(po0)
@@ -462,9 +465,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.UPDATE)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -485,9 +488,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogPartTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogPartTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -506,9 +509,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -523,9 +526,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = inputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === catalogTableShort)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(catalogTableShort)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = true)
@@ -550,9 +553,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -575,9 +578,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -600,9 +603,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -625,9 +628,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -650,9 +653,9 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val po = outputs.head
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.TABLE_OR_VIEW)
- assert(po.catalog === Some(catalogV2))
- assert(po.dbname === namespace)
- assert(po.objectName === table)
+ assertEqualsIgnoreCase(Some(catalogV2))(po.catalog)
+ assertEqualsIgnoreCase(namespace)(po.dbname)
+ assertEqualsIgnoreCase(table)(po.objectName)
assert(po.columns.isEmpty)
checkV2TableOwner(po)
val accessType = AccessType(po, operationType, isInput = false)
@@ -667,7 +670,7 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
val spec = DB_COMMAND_SPECS(plan1.getClass.getName)
var db: Database = null
spec.databaseDescs.find { d =>
- Try(db = d.extract(plan1)).isSuccess
+ Try { db = d.extract(plan1) }.isSuccess
}
withClue(sql1) {
assert(db.catalog === None)
@@ -688,8 +691,8 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.get === sparkSessionCatalogName)
- assert(po.dbname === "default")
- assert(po.objectName === "default")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase(defaultDb)(po.objectName)
assert(po.columns.isEmpty)
}
@@ -707,8 +710,8 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.get === sparkSessionCatalogName)
- assert(po.dbname === "CreateNamespace")
- assert(po.objectName === "CreateNamespace")
+ assertEqualsIgnoreCase("CreateNamespace")(po.dbname)
+ assertEqualsIgnoreCase("CreateNamespace")(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.CREATE)
@@ -732,8 +735,8 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.get === sparkSessionCatalogName)
- assert(po.dbname === "default")
- assert(po.objectName === "default")
+ assertEqualsIgnoreCase(defaultDb)(po.dbname)
+ assertEqualsIgnoreCase(defaultDb)(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.ALTER)
@@ -751,8 +754,8 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.get === sparkSessionCatalogName)
- assert(po.dbname equalsIgnoreCase reusedDb)
- assert(po.objectName equalsIgnoreCase reusedDb)
+ assertEqualsIgnoreCase(reusedDb)(po.dbname)
+ assertEqualsIgnoreCase(reusedDb)(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.USE)
@@ -775,8 +778,8 @@ abstract class V2CommandsPrivilegesSuite extends PrivilegesBuilderSuite {
assert(po.actionType === PrivilegeObjectActionType.OTHER)
assert(po.privilegeObjectType === PrivilegeObjectType.DATABASE)
assert(po.catalog.get === sparkSessionCatalogName)
- assert(po.dbname === "DropNameSpace")
- assert(po.objectName === "DropNameSpace")
+ assertEqualsIgnoreCase(db)(po.dbname)
+ assertEqualsIgnoreCase(db)(po.objectName)
assert(po.columns.isEmpty)
val accessType = ranger.AccessType(po, operationType, isInput = false)
assert(accessType === AccessType.DROP)
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/V2JdbcTableCatalogPrivilegesBuilderSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/V2JdbcTableCatalogPrivilegesBuilderSuite.scala
index f85689406dc..1037d9811ee 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/V2JdbcTableCatalogPrivilegesBuilderSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/V2JdbcTableCatalogPrivilegesBuilderSuite.scala
@@ -23,6 +23,8 @@ import scala.util.Try
import org.scalatest.Outcome
import org.apache.kyuubi.plugin.spark.authz.serde._
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.AssertionUtils._
class V2JdbcTableCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite {
override protected val catalogImpl: String = "in-memory"
@@ -77,12 +79,12 @@ class V2JdbcTableCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite
val spec = TABLE_COMMAND_SPECS(plan.getClass.getName)
var table: Table = null
spec.tableDescs.find { d =>
- Try(table = d.extract(plan, spark).get).isSuccess
+ Try { table = d.extract(plan, spark).get }.isSuccess
}
withClue(str) {
- assert(table.catalog === Some(catalogV2))
- assert(table.database === Some(ns1))
- assert(table.table === tbl)
+ assertEqualsIgnoreCase(Some(catalogV2))(table.catalog)
+ assertEqualsIgnoreCase(Some(ns1))(table.database)
+ assertEqualsIgnoreCase(tbl)(table.table)
assert(table.owner.isEmpty)
}
}
@@ -102,12 +104,12 @@ class V2JdbcTableCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite
val spec = TABLE_COMMAND_SPECS(plan.getClass.getName)
var table: Table = null
spec.tableDescs.find { d =>
- Try(table = d.extract(plan, spark).get).isSuccess
+ Try { table = d.extract(plan, spark).get }.isSuccess
}
withClue(sql1) {
- assert(table.catalog === Some(catalogV2))
- assert(table.database === Some(ns1))
- assert(table.table === tbl)
+ assertEqualsIgnoreCase(Some(catalogV2))(table.catalog)
+ assertEqualsIgnoreCase(Some(ns1))(table.database)
+ assertEqualsIgnoreCase(tbl)(table.table)
assert(table.owner.isEmpty)
}
}
@@ -125,11 +127,11 @@ class V2JdbcTableCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite
val plan = executePlan(sql1).analyzed
val spec = TABLE_COMMAND_SPECS(plan.getClass.getName)
var table: Table = null
- spec.tableDescs.find { d => Try(table = d.extract(plan, spark).get).isSuccess }
+ spec.tableDescs.find { d => Try { table = d.extract(plan, spark).get }.isSuccess }
withClue(sql1) {
- assert(table.catalog === Some(catalogV2))
- assert(table.database === Some(ns1))
- assert(table.table === tbl)
+ assertEqualsIgnoreCase(Some(catalogV2))(table.catalog)
+ assertEqualsIgnoreCase(Some(ns1))(table.database)
+ assertEqualsIgnoreCase(tbl)(table.table)
assert(table.owner.isEmpty)
}
}
@@ -144,11 +146,11 @@ class V2JdbcTableCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite
val spec = DB_COMMAND_SPECS(plan.getClass.getName)
var db: Database = null
spec.databaseDescs.find { d =>
- Try(db = d.extract(plan)).isSuccess
+ Try { db = d.extract(plan) }.isSuccess
}
withClue(sql) {
- assert(db.catalog === Some(catalogV2))
- assert(db.database === ns1)
+ assertEqualsIgnoreCase(Some(catalogV2))(db.catalog)
+ assertEqualsIgnoreCase(ns1)(db.database)
}
}
@@ -163,11 +165,11 @@ class V2JdbcTableCatalogPrivilegesBuilderSuite extends V2CommandsPrivilegesSuite
val spec = DB_COMMAND_SPECS(plan.getClass.getName)
var db: Database = null
spec.databaseDescs.find { d =>
- Try(db = d.extract(plan)).isSuccess
+ Try { db = d.extract(plan) }.isSuccess
}
withClue(sql1) {
- assert(db.catalog === Some(catalogV2))
- assert(db.database === ns1)
+ assertEqualsIgnoreCase(Some(catalogV2))(db.catalog)
+ assertEqualsIgnoreCase(ns1)(db.database)
}
}
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/DatabaseCommands.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/DatabaseCommands.scala
index e947579e9f7..a61c142edb5 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/DatabaseCommands.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/DatabaseCommands.scala
@@ -58,9 +58,10 @@ object DatabaseCommands {
"namespace",
classOf[StringSeqDatabaseExtractor],
catalogDesc = Some(CatalogDesc()))
+ val databaseDesc3 = DatabaseDesc("name", classOf[ResolvedNamespaceDatabaseExtractor])
DatabaseCommandSpec(
"org.apache.spark.sql.catalyst.plans.logical.CreateNamespace",
- Seq(databaseDesc1, databaseDesc2),
+ Seq(databaseDesc1, databaseDesc2, databaseDesc3),
CREATEDATABASE)
}
@@ -97,12 +98,12 @@ object DatabaseCommands {
val SetCatalogAndNamespace = {
val cmd = "org.apache.spark.sql.catalyst.plans.logical.SetCatalogAndNamespace"
- val databaseDesc1 =
+ val resolvedDbObjectDatabaseDesc =
DatabaseDesc(
"child",
classOf[ResolvedDBObjectNameDatabaseExtractor],
isInput = true)
- val databaseDesc2 =
+ val stringSeqOptionDatabaseDesc =
DatabaseDesc(
"namespace",
classOf[StringSeqOptionDatabaseExtractor],
@@ -110,7 +111,15 @@ object DatabaseCommands {
fieldName = "catalogName",
fieldExtractor = classOf[StringOptionCatalogExtractor])),
isInput = true)
- DatabaseCommandSpec(cmd, Seq(databaseDesc1, databaseDesc2), SWITCHDATABASE)
+ val resolvedNamespaceDatabaseDesc =
+ DatabaseDesc(
+ "child",
+ classOf[ResolvedNamespaceDatabaseExtractor],
+ isInput = true)
+ DatabaseCommandSpec(
+ cmd,
+ Seq(resolvedNamespaceDatabaseDesc, resolvedDbObjectDatabaseDesc, stringSeqOptionDatabaseDesc),
+ SWITCHDATABASE)
}
val SetNamespace = {
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/FunctionCommands.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/FunctionCommands.scala
index 46c7f0efac5..1822e80fc8a 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/FunctionCommands.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/FunctionCommands.scala
@@ -35,8 +35,12 @@ object FunctionCommands {
"functionName",
classOf[StringFunctionExtractor],
Some(databaseDesc),
- Some(functionTypeDesc))
- FunctionCommandSpec(cmd, Seq(functionDesc), CREATEFUNCTION)
+ functionTypeDesc = Some(functionTypeDesc))
+ val functionIdentifierDesc = FunctionDesc(
+ "identifier",
+ classOf[FunctionIdentifierFunctionExtractor],
+ functionTypeDesc = Some(functionTypeDesc))
+ FunctionCommandSpec(cmd, Seq(functionIdentifierDesc, functionDesc), CREATEFUNCTION)
}
val DescribeFunction = {
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/IcebergCommands.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/IcebergCommands.scala
index 208e73c51b3..355143c402c 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/IcebergCommands.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/IcebergCommands.scala
@@ -17,6 +17,7 @@
package org.apache.kyuubi.plugin.spark.authz.gen
+import org.apache.kyuubi.plugin.spark.authz.OperationType
import org.apache.kyuubi.plugin.spark.authz.PrivilegeObjectActionType._
import org.apache.kyuubi.plugin.spark.authz.serde._
@@ -49,7 +50,14 @@ object IcebergCommands {
TableCommandSpec(cmd, Seq(tableDesc), queryDescs = Seq(queryDesc))
}
+ val CallProcedure = {
+ val cmd = "org.apache.spark.sql.catalyst.plans.logical.Call"
+ val td = TableDesc("args", classOf[ExpressionSeqTableExtractor])
+ TableCommandSpec(cmd, Seq(td), opType = OperationType.ALTERTABLE_PROPERTIES)
+ }
+
val data: Array[TableCommandSpec] = Array(
+ CallProcedure,
DeleteFromIcebergTable,
UpdateIcebergTable,
MergeIntoIcebergTable,
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/JsonSpecFileGenerator.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/JsonSpecFileGenerator.scala
index 7c7ed138b27..855e25e87ea 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/JsonSpecFileGenerator.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/JsonSpecFileGenerator.scala
@@ -18,37 +18,62 @@
package org.apache.kyuubi.plugin.spark.authz.gen
import java.nio.charset.StandardCharsets
-import java.nio.file.{Files, Paths}
+import java.nio.file.{Files, Paths, StandardOpenOption}
+
+//scalastyle:off
+import org.scalatest.funsuite.AnyFunSuite
import org.apache.kyuubi.plugin.spark.authz.serde.{mapper, CommandSpec}
+import org.apache.kyuubi.util.AssertionUtils._
/**
* Generates the default command specs to src/main/resources dir.
*
- * Usage:
- * mvn scala:run -DmainClass=this class -pl :kyuubi-spark-authz_2.12
+ * To run the test suite:
+ * {{{
+ * KYUUBI_UPDATE=0 dev/gen/gen_ranger_spec_json.sh
+ * }}}
+ *
+ * To regenerate the ranger policy file:
+ * {{{
+ * dev/gen/gen_ranger_spec_json.sh
+ * }}}
*/
-object JsonSpecFileGenerator {
-
- def main(args: Array[String]): Unit = {
+class JsonSpecFileGenerator extends AnyFunSuite {
+ // scalastyle:on
+ test("check spec json files") {
writeCommandSpecJson("database", DatabaseCommands.data)
writeCommandSpecJson("table", TableCommands.data ++ IcebergCommands.data)
writeCommandSpecJson("function", FunctionCommands.data)
writeCommandSpecJson("scan", Scans.data)
}
- def writeCommandSpecJson[T <: CommandSpec](commandType: String, specArr: Array[T]): Unit = {
+ def writeCommandSpecJson[T <: CommandSpec](
+ commandType: String,
+ specArr: Array[T]): Unit = {
val pluginHome = getClass.getProtectionDomain.getCodeSource.getLocation.getPath
.split("target").head
val filename = s"${commandType}_command_spec.json"
- val writer = {
- val p = Paths.get(pluginHome, "src", "main", "resources", filename)
- Files.newBufferedWriter(p, StandardCharsets.UTF_8)
+ val filePath = Paths.get(pluginHome, "src", "main", "resources", filename)
+
+ val generatedStr = mapper.writerWithDefaultPrettyPrinter()
+ .writeValueAsString(specArr.sortBy(_.classname))
+
+ if (sys.env.get("KYUUBI_UPDATE").contains("1")) {
+ // scalastyle:off println
+ println(s"writing ${specArr.length} specs to $filename")
+ // scalastyle:on println
+ Files.write(
+ filePath,
+ generatedStr.getBytes(StandardCharsets.UTF_8),
+ StandardOpenOption.CREATE,
+ StandardOpenOption.TRUNCATE_EXISTING)
+ } else {
+ assertFileContent(
+ filePath,
+ Seq(generatedStr),
+ "dev/gen/gen_ranger_spec_json.sh",
+ splitFirstExpectedLine = true)
}
- // scalastyle:off println
- println(s"writing ${specArr.length} specs to $filename")
- // scalastyle:on println
- mapper.writerWithDefaultPrettyPrinter().writeValue(writer, specArr.sortBy(_.classname))
- writer.close()
}
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/Scans.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/Scans.scala
index 7bd8260bba5..b2c1868a26d 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/Scans.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/Scans.scala
@@ -18,6 +18,7 @@
package org.apache.kyuubi.plugin.spark.authz.gen
import org.apache.kyuubi.plugin.spark.authz.serde._
+import org.apache.kyuubi.plugin.spark.authz.serde.FunctionType._
object Scans {
@@ -57,9 +58,34 @@ object Scans {
ScanSpec(r, Seq(tableDesc))
}
+ val HiveSimpleUDF = {
+ ScanSpec(
+ "org.apache.spark.sql.hive.HiveSimpleUDF",
+ Seq.empty,
+ Seq(FunctionDesc(
+ "name",
+ classOf[QualifiedNameStringFunctionExtractor],
+ functionTypeDesc = Some(FunctionTypeDesc(
+ "name",
+ classOf[FunctionNameFunctionTypeExtractor],
+ Seq(TEMP, SYSTEM))),
+ isInput = true)))
+ }
+
+ val HiveGenericUDF = HiveSimpleUDF.copy(classname = "org.apache.spark.sql.hive.HiveGenericUDF")
+
+ val HiveUDAFFunction = HiveSimpleUDF.copy(classname =
+ "org.apache.spark.sql.hive.HiveUDAFFunction")
+
+ val HiveGenericUDTF = HiveSimpleUDF.copy(classname = "org.apache.spark.sql.hive.HiveGenericUDTF")
+
val data: Array[ScanSpec] = Array(
HiveTableRelation,
LogicalRelation,
DataSourceV2Relation,
- PermanentViewMarker)
+ PermanentViewMarker,
+ HiveSimpleUDF,
+ HiveGenericUDF,
+ HiveUDAFFunction,
+ HiveGenericUDTF)
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/TableCommands.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/TableCommands.scala
index a8b8121e2b0..ca2ee92948e 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/TableCommands.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/gen/TableCommands.scala
@@ -30,6 +30,8 @@ object TableCommands {
val resolvedTableDesc = TableDesc("child", classOf[ResolvedTableTableExtractor])
val resolvedDbObjectNameDesc =
TableDesc("child", classOf[ResolvedDbObjectNameTableExtractor])
+ val resolvedIdentifierTableDesc =
+ TableDesc("child", classOf[ResolvedIdentifierTableExtractor])
val overwriteActionTypeDesc =
ActionTypeDesc("overwrite", classOf[OverwriteOrInsertActionTypeExtractor])
val queryQueryDesc = QueryDesc("query")
@@ -179,7 +181,8 @@ object TableCommands {
val cd2 = cd1.copy(fieldExtractor = classOf[StringSeqOptionColumnExtractor])
val td1 = tableIdentDesc.copy(columnDesc = Some(cd1), isInput = true)
val td2 = td1.copy(columnDesc = Some(cd2))
- TableCommandSpec(cmd, Seq(td1, td2), ANALYZE_TABLE)
+ // AnalyzeColumn will update table properties, here we use ALTERTABLE_PROPERTIES
+ TableCommandSpec(cmd, Seq(tableIdentDesc, td1, td2), ALTERTABLE_PROPERTIES)
}
val AnalyzePartition = {
@@ -187,16 +190,18 @@ object TableCommands {
val columnDesc = ColumnDesc("partitionSpec", classOf[PartitionColumnExtractor])
TableCommandSpec(
cmd,
- Seq(tableIdentDesc.copy(columnDesc = Some(columnDesc), isInput = true)),
- ANALYZE_TABLE)
+ // AnalyzePartition will update table properties, here we use ALTERTABLE_PROPERTIES
+ Seq(tableIdentDesc, tableIdentDesc.copy(columnDesc = Some(columnDesc), isInput = true)),
+ ALTERTABLE_PROPERTIES)
}
val AnalyzeTable = {
val cmd = "org.apache.spark.sql.execution.command.AnalyzeTableCommand"
TableCommandSpec(
cmd,
- Seq(tableIdentDesc.copy(isInput = true)),
- ANALYZE_TABLE)
+ // AnalyzeTable will update table properties, here we use ALTERTABLE_PROPERTIES
+ Seq(tableIdentDesc, tableIdentDesc.copy(isInput = true)),
+ ALTERTABLE_PROPERTIES)
}
val CreateTableV2 = {
@@ -205,7 +210,10 @@ object TableCommands {
"tableName",
classOf[IdentifierTableExtractor],
catalogDesc = Some(CatalogDesc()))
- TableCommandSpec(cmd, Seq(tableDesc, resolvedDbObjectNameDesc), CREATETABLE)
+ TableCommandSpec(
+ cmd,
+ Seq(resolvedIdentifierTableDesc, tableDesc, resolvedDbObjectNameDesc),
+ CREATETABLE)
}
val CreateV2Table = {
@@ -225,7 +233,10 @@ object TableCommands {
catalogDesc = Some(CatalogDesc()))
TableCommandSpec(
cmd,
- Seq(tableDesc, resolvedDbObjectNameDesc.copy(fieldName = "left")),
+ Seq(
+ resolvedIdentifierTableDesc.copy(fieldName = "left"),
+ tableDesc,
+ resolvedDbObjectNameDesc.copy(fieldName = "left")),
CREATETABLE_AS_SELECT,
Seq(queryQueryDesc))
}
@@ -438,8 +449,7 @@ object TableCommands {
val DropTableV2 = {
val cmd = "org.apache.spark.sql.catalyst.plans.logical.DropTable"
- val tableDesc1 = resolvedTableDesc
- TableCommandSpec(cmd, Seq(tableDesc1), DROPTABLE)
+ TableCommandSpec(cmd, Seq(resolvedIdentifierTableDesc, resolvedTableDesc), DROPTABLE)
}
val MergeIntoTable = {
@@ -600,8 +610,6 @@ object TableCommands {
AnalyzeColumn,
AnalyzePartition,
AnalyzeTable,
- AnalyzeTable.copy(classname =
- "org.apache.spark.sql.execution.command.AnalyzeTablesCommand"),
AppendDataV2,
CacheTable,
CacheTableAsSelect,
@@ -637,7 +645,7 @@ object TableCommands {
"org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand"),
InsertIntoHadoopFsRelationCommand,
InsertIntoDataSourceDir.copy(classname =
- "org.apache.spark.sql.execution.datasources.InsertIntoHiveDirCommand"),
+ "org.apache.spark.sql.hive.execution.InsertIntoHiveDirCommand"),
InsertIntoHiveTable,
LoadData,
MergeIntoTable,
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/IcebergCatalogRangerSparkExtensionSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/IcebergCatalogRangerSparkExtensionSuite.scala
index 6b1cedf786f..55fde3b685b 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/IcebergCatalogRangerSparkExtensionSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/IcebergCatalogRangerSparkExtensionSuite.scala
@@ -23,11 +23,17 @@ import org.scalatest.Outcome
import org.apache.kyuubi.Utils
import org.apache.kyuubi.plugin.spark.authz.AccessControlException
+import org.apache.kyuubi.plugin.spark.authz.RangerTestNamespace._
+import org.apache.kyuubi.plugin.spark.authz.RangerTestUsers._
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.tags.IcebergTest
+import org.apache.kyuubi.util.AssertionUtils._
/**
* Tests for RangerSparkExtensionSuite
* on Iceberg catalog with DataSource V2 API.
*/
+@IcebergTest
class IcebergCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
override protected val catalogImpl: String = "hive"
override protected val sqlExtensions: String =
@@ -36,7 +42,7 @@ class IcebergCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite
else ""
val catalogV2 = "local"
- val namespace1 = "iceberg_ns"
+ val namespace1 = icebergNamespace
val table1 = "table1"
val outputTable1 = "outputTable1"
@@ -57,18 +63,18 @@ class IcebergCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite
super.beforeAll()
- doAs("admin", sql(s"CREATE DATABASE IF NOT EXISTS $catalogV2.$namespace1"))
+ doAs(admin, sql(s"CREATE DATABASE IF NOT EXISTS $catalogV2.$namespace1"))
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $catalogV2.$namespace1.$table1" +
" (id int, name string, city string) USING iceberg"))
doAs(
- "admin",
+ admin,
sql(s"INSERT INTO $catalogV2.$namespace1.$table1" +
" (id , name , city ) VALUES (1, 'liangbowen','Guangzhou')"))
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $catalogV2.$namespace1.$outputTable1" +
" (id int, name string, city string) USING iceberg"))
}
@@ -93,44 +99,37 @@ class IcebergCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite
// MergeIntoTable: Using a MERGE INTO Statement
val e1 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(mergeIntoSql)))
assert(e1.getMessage.contains(s"does not have [select] privilege" +
s" on [$namespace1/$table1/id]"))
- try {
- SparkRangerAdminPlugin.getRangerConf.setBoolean(
- s"ranger.plugin.${SparkRangerAdminPlugin.getServiceType}.authorize.in.single.call",
- true)
+ withSingleCallEnabled {
val e2 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(mergeIntoSql)))
assert(e2.getMessage.contains(s"does not have" +
s" [select] privilege" +
s" on [$namespace1/$table1/id,$namespace1/table1/name,$namespace1/$table1/city]," +
s" [update] privilege on [$namespace1/$outputTable1]"))
- } finally {
- SparkRangerAdminPlugin.getRangerConf.setBoolean(
- s"ranger.plugin.${SparkRangerAdminPlugin.getServiceType}.authorize.in.single.call",
- false)
}
- doAs("admin", sql(mergeIntoSql))
+ doAs(admin, sql(mergeIntoSql))
}
test("[KYUUBI #3515] UPDATE TABLE") {
// UpdateTable
val e1 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"UPDATE $catalogV2.$namespace1.$table1 SET city='Guangzhou' " +
" WHERE id=1")))
assert(e1.getMessage.contains(s"does not have [update] privilege" +
s" on [$namespace1/$table1]"))
doAs(
- "admin",
+ admin,
sql(s"UPDATE $catalogV2.$namespace1.$table1 SET city='Guangzhou' " +
" WHERE id=1"))
}
@@ -138,11 +137,11 @@ class IcebergCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite
test("[KYUUBI #3515] DELETE FROM TABLE") {
// DeleteFromTable
val e6 = intercept[AccessControlException](
- doAs("someone", sql(s"DELETE FROM $catalogV2.$namespace1.$table1 WHERE id=2")))
+ doAs(someone, sql(s"DELETE FROM $catalogV2.$namespace1.$table1 WHERE id=2")))
assert(e6.getMessage.contains(s"does not have [update] privilege" +
s" on [$namespace1/$table1]"))
- doAs("admin", sql(s"DELETE FROM $catalogV2.$namespace1.$table1 WHERE id=2"))
+ doAs(admin, sql(s"DELETE FROM $catalogV2.$namespace1.$table1 WHERE id=2"))
}
test("[KYUUBI #3666] Support {OWNER} variable for queries run on CatalogV2") {
@@ -163,7 +162,7 @@ class IcebergCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite
}.isSuccess))
doAs(
- "create_only_user", {
+ createOnlyUser, {
val e = intercept[AccessControlException](sql(select).collect())
assert(e.getMessage === errorMessage("select", s"$namespace1/$table/key"))
})
@@ -178,17 +177,17 @@ class IcebergCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite
(s"$catalogV2.default.src", "table"),
(s"$catalogV2.default.outputTable2", "table"))) {
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $catalogV2.default.src" +
" (id int, name string, key string) USING iceberg"))
doAs(
- "admin",
+ admin,
sql(s"INSERT INTO $catalogV2.default.src" +
" (id , name , key ) VALUES " +
"(1, 'liangbowen1','10')" +
", (2, 'liangbowen2','20')"))
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $catalogV2.$namespace1.$outputTable2" +
" (id int, name string, key string) USING iceberg"))
@@ -200,20 +199,20 @@ class IcebergCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite
|WHEN NOT MATCHED THEN INSERT (id, name, key) VALUES (source.id, source.name, source.key)
""".stripMargin
- doAs("admin", sql(mergeIntoSql))
+ doAs(admin, sql(mergeIntoSql))
doAs(
- "admin", {
+ admin, {
val countOutputTable =
sql(s"select count(1) from $catalogV2.$namespace1.$outputTable2").collect()
val rowCount = countOutputTable(0).get(0)
assert(rowCount === 2)
})
- doAs("admin", sql(s"truncate table $catalogV2.$namespace1.$outputTable2"))
+ doAs(admin, sql(s"truncate table $catalogV2.$namespace1.$outputTable2"))
// source table with row filter `key`<20
- doAs("bob", sql(mergeIntoSql))
+ doAs(bob, sql(mergeIntoSql))
doAs(
- "admin", {
+ admin, {
val countOutputTable =
sql(s"select count(1) from $catalogV2.$namespace1.$outputTable2").collect()
val rowCount = countOutputTable(0).get(0)
@@ -224,8 +223,67 @@ class IcebergCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite
test("[KYUUBI #4255] DESCRIBE TABLE") {
val e1 = intercept[AccessControlException](
- doAs("someone", sql(s"DESCRIBE TABLE $catalogV2.$namespace1.$table1").explain()))
+ doAs(someone, sql(s"DESCRIBE TABLE $catalogV2.$namespace1.$table1").explain()))
assert(e1.getMessage.contains(s"does not have [select] privilege" +
s" on [$namespace1/$table1]"))
}
+
+ test("CALL RewriteDataFilesProcedure") {
+ val tableName = "table_select_call_command_table"
+ val table = s"$catalogV2.$namespace1.$tableName"
+ val initDataFilesCount = 2
+ val rewriteDataFiles1 = s"CALL $catalogV2.system.rewrite_data_files " +
+ s"(table => '$table', options => map('min-input-files','$initDataFilesCount'))"
+ val rewriteDataFiles2 = s"CALL $catalogV2.system.rewrite_data_files " +
+ s"(table => '$table', options => map('min-input-files','${initDataFilesCount + 1}'))"
+
+ withCleanTmpResources(Seq((table, "table"))) {
+ doAs(
+ admin, {
+ sql(s"CREATE TABLE IF NOT EXISTS $table (id int, name string) USING iceberg")
+ // insert 2 data files
+ (0 until initDataFilesCount)
+ .foreach(i => sql(s"INSERT INTO $table VALUES ($i, 'user_$i')"))
+ })
+
+ interceptContains[AccessControlException](doAs(someone, sql(rewriteDataFiles1)))(
+ s"does not have [alter] privilege on [$namespace1/$tableName]")
+ interceptContains[AccessControlException](doAs(someone, sql(rewriteDataFiles2)))(
+ s"does not have [alter] privilege on [$namespace1/$tableName]")
+
+ /**
+ * Case 1: Number of input data files equals or greater than minimum expected.
+ * Two logical plans triggered
+ * when ( input-files(2) >= min-input-files(2) ):
+ *
+ * == Physical Plan 1 ==
+ * Call (1)
+ *
+ * == Physical Plan 2 ==
+ * AppendData (3)
+ * +- * ColumnarToRow (2)
+ * +- BatchScan local.iceberg_ns.call_command_table (1)
+ */
+ doAs(
+ admin, {
+ val result1 = sql(rewriteDataFiles1).collect()
+ // rewritten results into 2 data files
+ assert(result1(0)(0) === initDataFilesCount)
+ })
+
+ /**
+ * Case 2: Number of input data files less than minimum expected.
+ * Only one logical plan triggered
+ * when ( input-files(2) < min-input-files(3) )
+ *
+ * == Physical Plan ==
+ * Call (1)
+ */
+ doAs(
+ admin, {
+ val result2 = sql(rewriteDataFiles2).collect()
+ assert(result2(0)(0) === 0)
+ })
+ }
+ }
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RangerSparkExtensionSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RangerSparkExtensionSuite.scala
index 4ccf15cba98..0c307195cee 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RangerSparkExtensionSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RangerSparkExtensionSuite.scala
@@ -31,9 +31,11 @@ import org.scalatest.BeforeAndAfterAll
import org.scalatest.funsuite.AnyFunSuite
import org.apache.kyuubi.plugin.spark.authz.{AccessControlException, SparkSessionProvider}
+import org.apache.kyuubi.plugin.spark.authz.RangerTestNamespace._
+import org.apache.kyuubi.plugin.spark.authz.RangerTestUsers._
import org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.KYUUBI_AUTHZ_TAG
-import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils.getFieldVal
-
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
abstract class RangerSparkExtensionSuite extends AnyFunSuite
with SparkSessionProvider with BeforeAndAfterAll {
// scalastyle:on
@@ -87,8 +89,23 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
}
}
+ /**
+ * Enables authorizing in single call mode,
+ * and disables authorizing in single call mode after calling `f`
+ */
+ protected def withSingleCallEnabled(f: => Unit): Unit = {
+ val singleCallConfig =
+ s"ranger.plugin.${SparkRangerAdminPlugin.getServiceType}.authorize.in.single.call"
+ try {
+ SparkRangerAdminPlugin.getRangerConf.setBoolean(singleCallConfig, true)
+ f
+ } finally {
+ SparkRangerAdminPlugin.getRangerConf.setBoolean(singleCallConfig, false)
+ }
+ }
+
test("[KYUUBI #3226] RuleAuthorization: Should check privileges once only.") {
- val logicalPlan = doAs("admin", sql("SHOW TABLES").queryExecution.logical)
+ val logicalPlan = doAs(admin, sql("SHOW TABLES").queryExecution.logical)
val rule = new RuleAuthorization(spark)
(1 until 10).foreach { i =>
@@ -116,7 +133,7 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
withCleanTmpResources(Seq((testTable, "table"))) {
// create tmp table
doAs(
- "admin", {
+ admin, {
sql(create)
// session1: first query, should auth once.[LogicalRelation]
@@ -155,18 +172,18 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
val e = intercept[AccessControlException](sql(create))
assert(e.getMessage === errorMessage("create", "mydb"))
withCleanTmpResources(Seq((testDb, "database"))) {
- doAs("admin", assert(Try { sql(create) }.isSuccess))
- doAs("admin", assert(Try { sql(alter) }.isSuccess))
+ doAs(admin, assert(Try { sql(create) }.isSuccess))
+ doAs(admin, assert(Try { sql(alter) }.isSuccess))
val e1 = intercept[AccessControlException](sql(alter))
assert(e1.getMessage === errorMessage("alter", "mydb"))
val e2 = intercept[AccessControlException](sql(drop))
assert(e2.getMessage === errorMessage("drop", "mydb"))
- doAs("kent", Try(sql("SHOW DATABASES")).isSuccess)
+ doAs(kent, Try(sql("SHOW DATABASES")).isSuccess)
}
}
test("auth: tables") {
- val db = "default"
+ val db = defaultDb
val table = "src"
val col = "key"
@@ -178,14 +195,14 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
assert(e.getMessage === errorMessage("create"))
withCleanTmpResources(Seq((s"$db.$table", "table"))) {
- doAs("bob", assert(Try { sql(create0) }.isSuccess))
- doAs("bob", assert(Try { sql(alter0) }.isSuccess))
+ doAs(bob, assert(Try { sql(create0) }.isSuccess))
+ doAs(bob, assert(Try { sql(alter0) }.isSuccess))
val e1 = intercept[AccessControlException](sql(drop0))
assert(e1.getMessage === errorMessage("drop"))
- doAs("bob", assert(Try { sql(alter0) }.isSuccess))
- doAs("bob", assert(Try { sql(select).collect() }.isSuccess))
- doAs("kent", assert(Try { sql(s"SELECT key FROM $db.$table").collect() }.isSuccess))
+ doAs(bob, assert(Try { sql(alter0) }.isSuccess))
+ doAs(bob, assert(Try { sql(select).collect() }.isSuccess))
+ doAs(kent, assert(Try { sql(s"SELECT key FROM $db.$table").collect() }.isSuccess))
Seq(
select,
@@ -196,10 +213,10 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
s"SELECT key FROM $db.$table WHERE value in (SELECT value as key FROM $db.$table)")
.foreach { q =>
doAs(
- "kent", {
+ kent, {
withClue(q) {
val e = intercept[AccessControlException](sql(q).collect())
- assert(e.getMessage === errorMessage("select", "default/src/value", "kent"))
+ assert(e.getMessage === errorMessage("select", "default/src/value", kent))
}
})
}
@@ -207,15 +224,15 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
}
test("auth: functions") {
- val db = "default"
+ val db = defaultDb
val func = "func"
val create0 = s"CREATE FUNCTION IF NOT EXISTS $db.$func AS 'abc.mnl.xyz'"
doAs(
- "kent", {
+ kent, {
val e = intercept[AccessControlException](sql(create0))
assert(e.getMessage === errorMessage("create", "default/func"))
})
- doAs("admin", assert(Try(sql(create0)).isSuccess))
+ doAs(admin, assert(Try(sql(create0)).isSuccess))
}
test("show tables") {
@@ -226,14 +243,14 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
(s"$db.$table", "table"),
(s"$db.${table}for_show", "table"),
(s"$db", "database"))) {
- doAs("admin", sql(s"CREATE DATABASE IF NOT EXISTS $db"))
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db.$table (key int) USING $format"))
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db.${table}for_show (key int) USING $format"))
-
- doAs("admin", assert(sql(s"show tables from $db").collect().length === 2))
- doAs("bob", assert(sql(s"show tables from $db").collect().length === 0))
- doAs("i_am_invisible", assert(sql(s"show tables from $db").collect().length === 0))
- doAs("i_am_invisible", assert(sql(s"show tables from $db").limit(1).isEmpty))
+ doAs(admin, sql(s"CREATE DATABASE IF NOT EXISTS $db"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db.$table (key int) USING $format"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db.${table}for_show (key int) USING $format"))
+
+ doAs(admin, assert(sql(s"show tables from $db").collect().length === 2))
+ doAs(bob, assert(sql(s"show tables from $db").collect().length === 0))
+ doAs(invisibleUser, assert(sql(s"show tables from $db").collect().length === 0))
+ doAs(invisibleUser, assert(sql(s"show tables from $db").limit(1).isEmpty))
}
}
@@ -241,19 +258,19 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
val db = "default2"
withCleanTmpResources(Seq((db, "database"))) {
- doAs("admin", sql(s"CREATE DATABASE IF NOT EXISTS $db"))
- doAs("admin", assert(sql(s"SHOW DATABASES").collect().length == 2))
- doAs("admin", assert(sql(s"SHOW DATABASES").collectAsList().get(0).getString(0) == "default"))
- doAs("admin", assert(sql(s"SHOW DATABASES").collectAsList().get(1).getString(0) == s"$db"))
-
- doAs("bob", assert(sql(s"SHOW DATABASES").collect().length == 1))
- doAs("bob", assert(sql(s"SHOW DATABASES").collectAsList().get(0).getString(0) == "default"))
- doAs("i_am_invisible", assert(sql(s"SHOW DATABASES").limit(1).isEmpty))
+ doAs(admin, sql(s"CREATE DATABASE IF NOT EXISTS $db"))
+ doAs(admin, assert(sql(s"SHOW DATABASES").collect().length == 2))
+ doAs(admin, assert(sql(s"SHOW DATABASES").collectAsList().get(0).getString(0) == defaultDb))
+ doAs(admin, assert(sql(s"SHOW DATABASES").collectAsList().get(1).getString(0) == s"$db"))
+
+ doAs(bob, assert(sql(s"SHOW DATABASES").collect().length == 1))
+ doAs(bob, assert(sql(s"SHOW DATABASES").collectAsList().get(0).getString(0) == defaultDb))
+ doAs(invisibleUser, assert(sql(s"SHOW DATABASES").limit(1).isEmpty))
}
}
test("show functions") {
- val default = "default"
+ val default = defaultDb
val db3 = "default3"
val function1 = "function1"
@@ -261,41 +278,41 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
(s"$default.$function1", "function"),
(s"$db3.$function1", "function"),
(db3, "database"))) {
- doAs("admin", sql(s"CREATE FUNCTION $function1 AS 'Function1'"))
- doAs("admin", assert(sql(s"show user functions $default.$function1").collect().length == 1))
- doAs("bob", assert(sql(s"show user functions $default.$function1").collect().length == 0))
+ doAs(admin, sql(s"CREATE FUNCTION $function1 AS 'Function1'"))
+ doAs(admin, assert(sql(s"show user functions $default.$function1").collect().length == 1))
+ doAs(bob, assert(sql(s"show user functions $default.$function1").collect().length == 0))
- doAs("admin", sql(s"CREATE DATABASE IF NOT EXISTS $db3"))
- doAs("admin", sql(s"CREATE FUNCTION $db3.$function1 AS 'Function1'"))
+ doAs(admin, sql(s"CREATE DATABASE IF NOT EXISTS $db3"))
+ doAs(admin, sql(s"CREATE FUNCTION $db3.$function1 AS 'Function1'"))
- doAs("admin", assert(sql(s"show user functions $db3.$function1").collect().length == 1))
- doAs("bob", assert(sql(s"show user functions $db3.$function1").collect().length == 0))
+ doAs(admin, assert(sql(s"show user functions $db3.$function1").collect().length == 1))
+ doAs(bob, assert(sql(s"show user functions $db3.$function1").collect().length == 0))
- doAs("admin", assert(sql(s"show system functions").collect().length > 0))
- doAs("bob", assert(sql(s"show system functions").collect().length > 0))
+ doAs(admin, assert(sql(s"show system functions").collect().length > 0))
+ doAs(bob, assert(sql(s"show system functions").collect().length > 0))
- val adminSystemFunctionCount = doAs("admin", sql(s"show system functions").collect().length)
- val bobSystemFunctionCount = doAs("bob", sql(s"show system functions").collect().length)
+ val adminSystemFunctionCount = doAs(admin, sql(s"show system functions").collect().length)
+ val bobSystemFunctionCount = doAs(bob, sql(s"show system functions").collect().length)
assert(adminSystemFunctionCount == bobSystemFunctionCount)
}
}
test("show columns") {
- val db = "default"
+ val db = defaultDb
val table = "src"
val col = "key"
val create = s"CREATE TABLE IF NOT EXISTS $db.$table ($col int, value int) USING $format"
withCleanTmpResources(Seq((s"$db.$table", "table"))) {
- doAs("admin", sql(create))
+ doAs(admin, sql(create))
- doAs("admin", assert(sql(s"SHOW COLUMNS IN $table").count() == 2))
- doAs("admin", assert(sql(s"SHOW COLUMNS IN $db.$table").count() == 2))
- doAs("admin", assert(sql(s"SHOW COLUMNS IN $table IN $db").count() == 2))
+ doAs(admin, assert(sql(s"SHOW COLUMNS IN $table").count() == 2))
+ doAs(admin, assert(sql(s"SHOW COLUMNS IN $db.$table").count() == 2))
+ doAs(admin, assert(sql(s"SHOW COLUMNS IN $table IN $db").count() == 2))
- doAs("kent", assert(sql(s"SHOW COLUMNS IN $table").count() == 1))
- doAs("kent", assert(sql(s"SHOW COLUMNS IN $db.$table").count() == 1))
- doAs("kent", assert(sql(s"SHOW COLUMNS IN $table IN $db").count() == 1))
+ doAs(kent, assert(sql(s"SHOW COLUMNS IN $table").count() == 1))
+ doAs(kent, assert(sql(s"SHOW COLUMNS IN $db.$table").count() == 1))
+ doAs(kent, assert(sql(s"SHOW COLUMNS IN $table IN $db").count() == 1))
}
}
@@ -310,24 +327,24 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
(s"$db.${table}_select2", "table"),
(s"$db.${table}_select3", "table"),
(s"$db", "database"))) {
- doAs("admin", sql(s"CREATE DATABASE IF NOT EXISTS $db"))
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_use1 (key int) USING $format"))
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_use2 (key int) USING $format"))
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_select1 (key int) USING $format"))
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_select2 (key int) USING $format"))
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_select3 (key int) USING $format"))
+ doAs(admin, sql(s"CREATE DATABASE IF NOT EXISTS $db"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_use1 (key int) USING $format"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_use2 (key int) USING $format"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_select1 (key int) USING $format"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_select2 (key int) USING $format"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db.${table}_select3 (key int) USING $format"))
doAs(
- "admin",
+ admin,
assert(sql(s"show table extended from $db like '$table*'").collect().length === 5))
doAs(
- "bob",
+ bob,
assert(sql(s"show tables from $db").collect().length === 5))
doAs(
- "bob",
+ bob,
assert(sql(s"show table extended from $db like '$table*'").collect().length === 3))
doAs(
- "i_am_invisible",
+ invisibleUser,
assert(sql(s"show table extended from $db like '$table*'").collect().length === 0))
}
}
@@ -339,48 +356,48 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
val globalTempView2 = "global_temp_view2"
// create or replace view
- doAs("denyuser", sql(s"CREATE TEMPORARY VIEW $tempView AS select * from values(1)"))
+ doAs(denyUser, sql(s"CREATE TEMPORARY VIEW $tempView AS select * from values(1)"))
doAs(
- "denyuser",
+ denyUser,
sql(s"CREATE GLOBAL TEMPORARY VIEW $globalTempView AS SELECT * FROM values(1)"))
// rename view
- doAs("denyuser2", sql(s"ALTER VIEW $tempView RENAME TO $tempView2"))
+ doAs(denyUser2, sql(s"ALTER VIEW $tempView RENAME TO $tempView2"))
doAs(
- "denyuser2",
+ denyUser2,
sql(s"ALTER VIEW global_temp.$globalTempView RENAME TO global_temp.$globalTempView2"))
- doAs("admin", sql(s"DROP VIEW IF EXISTS $tempView2"))
- doAs("admin", sql(s"DROP VIEW IF EXISTS global_temp.$globalTempView2"))
- doAs("admin", assert(sql("show tables from global_temp").collect().length == 0))
+ doAs(admin, sql(s"DROP VIEW IF EXISTS $tempView2"))
+ doAs(admin, sql(s"DROP VIEW IF EXISTS global_temp.$globalTempView2"))
+ doAs(admin, assert(sql("show tables from global_temp").collect().length == 0))
}
test("[KYUUBI #3426] Drop temp view should be skipped permission check") {
val tempView = "temp_view"
val globalTempView = "global_temp_view"
- doAs("denyuser", sql(s"CREATE TEMPORARY VIEW $tempView AS select * from values(1)"))
+ doAs(denyUser, sql(s"CREATE TEMPORARY VIEW $tempView AS select * from values(1)"))
doAs(
- "denyuser",
+ denyUser,
sql(s"CREATE OR REPLACE TEMPORARY VIEW $tempView" +
s" AS select * from values(1)"))
doAs(
- "denyuser",
+ denyUser,
sql(s"CREATE GLOBAL TEMPORARY VIEW $globalTempView AS SELECT * FROM values(1)"))
doAs(
- "denyuser",
+ denyUser,
sql(s"CREATE OR REPLACE GLOBAL TEMPORARY VIEW $globalTempView" +
s" AS select * from values(1)"))
// global_temp will contain the temporary view, even if it is not global
- doAs("admin", assert(sql("show tables from global_temp").collect().length == 2))
+ doAs(admin, assert(sql("show tables from global_temp").collect().length == 2))
- doAs("denyuser2", sql(s"DROP VIEW IF EXISTS $tempView"))
- doAs("denyuser2", sql(s"DROP VIEW IF EXISTS global_temp.$globalTempView"))
+ doAs(denyUser2, sql(s"DROP VIEW IF EXISTS $tempView"))
+ doAs(denyUser2, sql(s"DROP VIEW IF EXISTS global_temp.$globalTempView"))
- doAs("admin", assert(sql("show tables from global_temp").collect().length == 0))
+ doAs(admin, assert(sql("show tables from global_temp").collect().length == 0))
}
test("[KYUUBI #3428] AlterViewAsCommand should be skipped permission check") {
@@ -388,26 +405,26 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
val globalTempView = "global_temp_view"
// create or replace view
- doAs("denyuser", sql(s"CREATE TEMPORARY VIEW $tempView AS select * from values(1)"))
+ doAs(denyUser, sql(s"CREATE TEMPORARY VIEW $tempView AS select * from values(1)"))
doAs(
- "denyuser",
+ denyUser,
sql(s"CREATE OR REPLACE TEMPORARY VIEW $tempView" +
s" AS select * from values(1)"))
doAs(
- "denyuser",
+ denyUser,
sql(s"CREATE GLOBAL TEMPORARY VIEW $globalTempView AS SELECT * FROM values(1)"))
doAs(
- "denyuser",
+ denyUser,
sql(s"CREATE OR REPLACE GLOBAL TEMPORARY VIEW $globalTempView" +
s" AS select * from values(1)"))
// rename view
- doAs("denyuser2", sql(s"ALTER VIEW $tempView AS SELECT * FROM values(1)"))
- doAs("denyuser2", sql(s"ALTER VIEW global_temp.$globalTempView AS SELECT * FROM values(1)"))
+ doAs(denyUser2, sql(s"ALTER VIEW $tempView AS SELECT * FROM values(1)"))
+ doAs(denyUser2, sql(s"ALTER VIEW global_temp.$globalTempView AS SELECT * FROM values(1)"))
- doAs("admin", sql(s"DROP VIEW IF EXISTS $tempView"))
- doAs("admin", sql(s"DROP VIEW IF EXISTS global_temp.$globalTempView"))
- doAs("admin", assert(sql("show tables from global_temp").collect().length == 0))
+ doAs(admin, sql(s"DROP VIEW IF EXISTS $tempView"))
+ doAs(admin, sql(s"DROP VIEW IF EXISTS global_temp.$globalTempView"))
+ doAs(admin, assert(sql("show tables from global_temp").collect().length == 0))
}
test("[KYUUBI #3343] pass temporary view creation") {
@@ -416,28 +433,39 @@ abstract class RangerSparkExtensionSuite extends AnyFunSuite
withTempView(tempView) {
doAs(
- "denyuser",
+ denyUser,
assert(Try(sql(s"CREATE TEMPORARY VIEW $tempView AS select * from values(1)")).isSuccess))
doAs(
- "denyuser",
+ denyUser,
Try(sql(s"CREATE OR REPLACE TEMPORARY VIEW $tempView" +
s" AS select * from values(1)")).isSuccess)
}
withGlobalTempView(globalTempView) {
doAs(
- "denyuser",
+ denyUser,
Try(
sql(
s"CREATE GLOBAL TEMPORARY VIEW $globalTempView AS SELECT * FROM values(1)")).isSuccess)
doAs(
- "denyuser",
+ denyUser,
Try(sql(s"CREATE OR REPLACE GLOBAL TEMPORARY VIEW $globalTempView" +
s" AS select * from values(1)")).isSuccess)
}
- doAs("admin", assert(sql("show tables from global_temp").collect().length == 0))
+ doAs(admin, assert(sql("show tables from global_temp").collect().length == 0))
+ }
+
+ test("[KYUUBI #5172] Check USE permissions for DESCRIBE FUNCTION") {
+ val fun = s"$defaultDb.function1"
+
+ withCleanTmpResources(Seq((s"$fun", "function"))) {
+ doAs(admin, sql(s"CREATE FUNCTION $fun AS 'Function1'"))
+ doAs(admin, sql(s"DESC FUNCTION $fun").collect().length == 1)
+ val e = intercept[AccessControlException](doAs(denyUser, sql(s"DESC FUNCTION $fun")))
+ assert(e.getMessage === errorMessage("_any", "default/function1", denyUser))
+ }
}
}
@@ -450,12 +478,12 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
test("table stats must be specified") {
val table = "hive_src"
withCleanTmpResources(Seq((table, "table"))) {
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $table (id int)"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $table (id int)"))
doAs(
- "admin", {
+ admin, {
val hiveTableRelation = sql(s"SELECT * FROM $table")
.queryExecution.optimizedPlan.collectLeaves().head.asInstanceOf[HiveTableRelation]
- assert(getFieldVal[Option[Statistics]](hiveTableRelation, "tableStats").nonEmpty)
+ assert(getField[Option[Statistics]](hiveTableRelation, "tableStats").nonEmpty)
})
}
}
@@ -463,9 +491,9 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
test("HiveTableRelation should be able to be converted to LogicalRelation") {
val table = "hive_src"
withCleanTmpResources(Seq((table, "table"))) {
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $table (id int) STORED AS PARQUET"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $table (id int) STORED AS PARQUET"))
doAs(
- "admin", {
+ admin, {
val relation = sql(s"SELECT * FROM $table")
.queryExecution.optimizedPlan.collectLeaves().head
assert(relation.isInstanceOf[LogicalRelation])
@@ -483,7 +511,7 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
(s"$db.$table1", "table"),
(s"$db", "database"))) {
doAs(
- "admin", {
+ admin, {
sql(s"CREATE DATABASE IF NOT EXISTS $db")
sql(s"CREATE TABLE IF NOT EXISTS $db.$table1(id int) STORED AS PARQUET")
sql(s"INSERT INTO $db.$table1 SELECT 1")
@@ -504,16 +532,16 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
(adminPermView, "view"),
(permView, "view"),
(table, "table"))) {
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $table (id int)"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $table (id int)"))
- doAs("admin", sql(s"CREATE VIEW ${adminPermView} AS SELECT * FROM $table"))
+ doAs(admin, sql(s"CREATE VIEW ${adminPermView} AS SELECT * FROM $table"))
val e1 = intercept[AccessControlException](
- doAs("someone", sql(s"CREATE VIEW $permView AS SELECT 1 as a")))
+ doAs(someone, sql(s"CREATE VIEW $permView AS SELECT 1 as a")))
assert(e1.getMessage.contains(s"does not have [create] privilege on [default/$permView]"))
val e2 = intercept[AccessControlException](
- doAs("someone", sql(s"CREATE VIEW $permView AS SELECT * FROM $table")))
+ doAs(someone, sql(s"CREATE VIEW $permView AS SELECT * FROM $table")))
if (isSparkV32OrGreater) {
assert(e2.getMessage.contains(s"does not have [select] privilege on [default/$table/id]"))
} else {
@@ -523,20 +551,20 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
}
test("[KYUUBI #3326] check persisted view and skip shadowed table") {
- val db1 = "default"
+ val db1 = defaultDb
val table = "hive_src"
val permView = "perm_view"
withCleanTmpResources(Seq(
(s"$db1.$table", "table"),
(s"$db1.$permView", "view"))) {
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db1.$table (id int, name string)"))
- doAs("admin", sql(s"CREATE VIEW $db1.$permView AS SELECT * FROM $db1.$table"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db1.$table (id int, name string)"))
+ doAs(admin, sql(s"CREATE VIEW $db1.$permView AS SELECT * FROM $db1.$table"))
// KYUUBI #3326: with no privileges to the permanent view or the source table
val e1 = intercept[AccessControlException](
doAs(
- "someone", {
+ someone, {
sql(s"select * from $db1.$permView").collect()
}))
if (isSparkV31OrGreater) {
@@ -548,16 +576,16 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
}
test("KYUUBI #4504: query permanent view with privilege to permanent view only") {
- val db1 = "default"
+ val db1 = defaultDb
val table = "hive_src"
val permView = "perm_view"
- val userPermViewOnly = "user_perm_view_only"
+ val userPermViewOnly = permViewOnlyUser
withCleanTmpResources(Seq(
(s"$db1.$table", "table"),
(s"$db1.$permView", "view"))) {
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db1.$table (id int, name string)"))
- doAs("admin", sql(s"CREATE VIEW $db1.$permView AS SELECT * FROM $db1.$table"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db1.$table (id int, name string)"))
+ doAs(admin, sql(s"CREATE VIEW $db1.$permView AS SELECT * FROM $db1.$table"))
// query all columns of the permanent view
// with access privileges to the permanent view but no privilege to the source table
@@ -582,7 +610,7 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
}
test("[KYUUBI #3371] support throws all disallowed privileges in exception") {
- val db1 = "default"
+ val db1 = defaultDb
val srcTable1 = "hive_src1"
val srcTable2 = "hive_src2"
val sinkTable1 = "hive_sink1"
@@ -592,17 +620,17 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
(s"$db1.$srcTable2", "table"),
(s"$db1.$sinkTable1", "table"))) {
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $db1.$srcTable1" +
s" (id int, name string, city string)"))
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $db1.$srcTable2" +
s" (id int, age int)"))
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $db1.$sinkTable1" +
s" (id int, age int, name string, city string)"))
@@ -611,25 +639,17 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
s" FROM $db1.$srcTable1 as tb1" +
s" JOIN $db1.$srcTable2 as tb2" +
s" on tb1.id = tb2.id"
- val e1 = intercept[AccessControlException](doAs("someone", sql(insertSql1)))
+ val e1 = intercept[AccessControlException](doAs(someone, sql(insertSql1)))
assert(e1.getMessage.contains(s"does not have [select] privilege on [$db1/$srcTable1/id]"))
- try {
- SparkRangerAdminPlugin.getRangerConf.setBoolean(
- s"ranger.plugin.${SparkRangerAdminPlugin.getServiceType}.authorize.in.single.call",
- true)
- val e2 = intercept[AccessControlException](doAs("someone", sql(insertSql1)))
+ withSingleCallEnabled {
+ val e2 = intercept[AccessControlException](doAs(someone, sql(insertSql1)))
assert(e2.getMessage.contains(s"does not have" +
s" [select] privilege on" +
s" [$db1/$srcTable1/id,$db1/$srcTable1/name,$db1/$srcTable1/city," +
s"$db1/$srcTable2/age,$db1/$srcTable2/id]," +
s" [update] privilege on [$db1/$sinkTable1/id,$db1/$sinkTable1/age," +
s"$db1/$sinkTable1/name,$db1/$sinkTable1/city]"))
- } finally {
- // revert to default value
- SparkRangerAdminPlugin.getRangerConf.setBoolean(
- s"ranger.plugin.${SparkRangerAdminPlugin.getServiceType}.authorize.in.single.call",
- false)
}
}
}
@@ -637,7 +657,7 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
test("[KYUUBI #3411] skip checking cache table") {
if (isSparkV32OrGreater) { // cache table sql supported since 3.2.0
- val db1 = "default"
+ val db1 = defaultDb
val srcTable1 = "hive_src1"
val cacheTable1 = "cacheTable1"
val cacheTable2 = "cacheTable2"
@@ -652,23 +672,23 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
(s"$db1.$cacheTable4", "cache"))) {
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $db1.$srcTable1" +
s" (id int, name string, city string)"))
val e1 = intercept[AccessControlException](
- doAs("someone", sql(s"CACHE TABLE $cacheTable2 select * from $db1.$srcTable1")))
+ doAs(someone, sql(s"CACHE TABLE $cacheTable2 select * from $db1.$srcTable1")))
assert(
e1.getMessage.contains(s"does not have [select] privilege on [$db1/$srcTable1/id]"))
- doAs("admin", sql(s"CACHE TABLE $cacheTable3 SELECT 1 AS a, 2 AS b "))
- doAs("someone", sql(s"CACHE TABLE $cacheTable4 select 1 as a, 2 as b "))
+ doAs(admin, sql(s"CACHE TABLE $cacheTable3 SELECT 1 AS a, 2 AS b "))
+ doAs(someone, sql(s"CACHE TABLE $cacheTable4 select 1 as a, 2 as b "))
}
}
}
test("[KYUUBI #3608] Support {OWNER} variable for queries") {
- val db = "default"
+ val db = defaultDb
val table = "owner_variable"
val select = s"SELECT key FROM $db.$table"
@@ -687,7 +707,7 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
}.isSuccess))
doAs(
- "create_only_user", {
+ createOnlyUser, {
val e = intercept[AccessControlException](sql(select).collect())
assert(e.getMessage === errorMessage("select", s"$db/$table/key"))
})
@@ -701,10 +721,44 @@ class HiveCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSuite {
Seq(
(s"$db.$table", "table"),
(s"$db", "database"))) {
- doAs("admin", sql(s"CREATE DATABASE IF NOT EXISTS $db"))
- doAs("admin", sql(s"CREATE TABLE IF NOT EXISTS $db.$table (key int) USING $format"))
+ doAs(admin, sql(s"CREATE DATABASE IF NOT EXISTS $db"))
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db.$table (key int) USING $format"))
sql("SHOW DATABASES").queryExecution.optimizedPlan.stats
sql(s"SHOW TABLES IN $db").queryExecution.optimizedPlan.stats
}
}
+
+ test("[KYUUBI #4658] insert overwrite hive directory") {
+ val db1 = defaultDb
+ val table = "src"
+
+ withCleanTmpResources(Seq((s"$db1.$table", "table"))) {
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db1.$table (id int, name string)"))
+ val e = intercept[AccessControlException](
+ doAs(
+ someone,
+ sql(
+ s"""INSERT OVERWRITE DIRECTORY '/tmp/test_dir' ROW FORMAT DELIMITED FIELDS
+ | TERMINATED BY ','
+ | SELECT * FROM $db1.$table;""".stripMargin)))
+ assert(e.getMessage.contains(s"does not have [select] privilege on [$db1/$table/id]"))
+ }
+ }
+
+ test("[KYUUBI #4658] insert overwrite datasource directory") {
+ val db1 = defaultDb
+ val table = "src"
+
+ withCleanTmpResources(Seq((s"$db1.$table", "table"))) {
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db1.$table (id int, name string)"))
+ val e = intercept[AccessControlException](
+ doAs(
+ someone,
+ sql(
+ s"""INSERT OVERWRITE DIRECTORY '/tmp/test_dir'
+ | USING parquet
+ | SELECT * FROM $db1.$table;""".stripMargin)))
+ assert(e.getMessage.contains(s"does not have [select] privilege on [$db1/$table/id]"))
+ }
+ }
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/SparkRangerAdminPluginSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/SparkRangerAdminPluginSuite.scala
index 8711a728726..301ae87c553 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/SparkRangerAdminPluginSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/SparkRangerAdminPluginSuite.scala
@@ -22,6 +22,8 @@ import org.apache.hadoop.security.UserGroupInformation
import org.scalatest.funsuite.AnyFunSuite
import org.apache.kyuubi.plugin.spark.authz.{ObjectType, OperationType}
+import org.apache.kyuubi.plugin.spark.authz.RangerTestNamespace._
+import org.apache.kyuubi.plugin.spark.authz.RangerTestUsers._
import org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin._
class SparkRangerAdminPluginSuite extends AnyFunSuite {
@@ -29,13 +31,13 @@ class SparkRangerAdminPluginSuite extends AnyFunSuite {
test("get filter expression") {
val bob = UserGroupInformation.createRemoteUser("bob")
- val are = AccessResource(ObjectType.TABLE, "default", "src", null)
+ val are = AccessResource(ObjectType.TABLE, defaultDb, "src", null)
def buildAccessRequest(ugi: UserGroupInformation): AccessRequest = {
AccessRequest(are, ugi, OperationType.QUERY, AccessType.SELECT)
}
val maybeString = getFilterExpr(buildAccessRequest(bob))
assert(maybeString.get === "key<20")
- Seq("admin", "alice").foreach { user =>
+ Seq(admin, alice).foreach { user =>
val ugi = UserGroupInformation.createRemoteUser(user)
val maybeString = getFilterExpr(buildAccessRequest(ugi))
assert(maybeString.isEmpty)
@@ -45,18 +47,21 @@ class SparkRangerAdminPluginSuite extends AnyFunSuite {
test("get data masker") {
val bob = UserGroupInformation.createRemoteUser("bob")
def buildAccessRequest(ugi: UserGroupInformation, column: String): AccessRequest = {
- val are = AccessResource(ObjectType.COLUMN, "default", "src", column)
+ val are = AccessResource(ObjectType.COLUMN, defaultDb, "src", column)
AccessRequest(are, ugi, OperationType.QUERY, AccessType.SELECT)
}
assert(getMaskingExpr(buildAccessRequest(bob, "value1")).get === "md5(cast(value1 as string))")
assert(getMaskingExpr(buildAccessRequest(bob, "value2")).get ===
- "regexp_replace(regexp_replace(regexp_replace(value2, '[A-Z]', 'X'), '[a-z]', 'x')," +
- " '[0-9]', 'n')")
+ "regexp_replace(regexp_replace(regexp_replace(regexp_replace(value2, '[A-Z]', 'X')," +
+ " '[a-z]', 'x'), '[0-9]', 'n'), '[^A-Za-z0-9]', 'U')")
assert(getMaskingExpr(buildAccessRequest(bob, "value3")).get contains "regexp_replace")
assert(getMaskingExpr(buildAccessRequest(bob, "value4")).get === "date_trunc('YEAR', value4)")
- assert(getMaskingExpr(buildAccessRequest(bob, "value5")).get contains "regexp_replace")
+ assert(getMaskingExpr(buildAccessRequest(bob, "value5")).get ===
+ "concat(regexp_replace(regexp_replace(regexp_replace(regexp_replace(" +
+ "left(value5, length(value5) - 4), '[A-Z]', 'X'), '[a-z]', 'x')," +
+ " '[0-9]', 'n'), '[^A-Za-z0-9]', 'U'), right(value5, 4))")
- Seq("admin", "alice").foreach { user =>
+ Seq(admin, alice).foreach { user =>
val ugi = UserGroupInformation.createRemoteUser(user)
val maybeString = getMaskingExpr(buildAccessRequest(ugi, "value1"))
assert(maybeString.isEmpty)
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/V2JdbcTableCatalogRangerSparkExtensionSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/V2JdbcTableCatalogRangerSparkExtensionSuite.scala
index 73a13bc1c3c..5c27a470f74 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/V2JdbcTableCatalogRangerSparkExtensionSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/V2JdbcTableCatalogRangerSparkExtensionSuite.scala
@@ -22,6 +22,9 @@ import scala.util.Try
// scalastyle:off
import org.apache.kyuubi.plugin.spark.authz.AccessControlException
+import org.apache.kyuubi.plugin.spark.authz.RangerTestNamespace._
+import org.apache.kyuubi.plugin.spark.authz.RangerTestUsers._
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
/**
* Tests for RangerSparkExtensionSuite
@@ -32,8 +35,6 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
val catalogV2 = "testcat"
val jdbcCatalogV2 = "jdbc2"
- val namespace1 = "ns1"
- val namespace2 = "ns2"
val table1 = "table1"
val table2 = "table2"
val outputTable1 = "outputTable1"
@@ -54,13 +55,13 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
super.beforeAll()
- doAs("admin", sql(s"CREATE DATABASE IF NOT EXISTS $catalogV2.$namespace1"))
+ doAs(admin, sql(s"CREATE DATABASE IF NOT EXISTS $catalogV2.$namespace1"))
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $catalogV2.$namespace1.$table1" +
" (id int, name string, city string)"))
doAs(
- "admin",
+ admin,
sql(s"CREATE TABLE IF NOT EXISTS $catalogV2.$namespace1.$outputTable1" +
" (id int, name string, city string)"))
}
@@ -82,7 +83,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// create database
val e1 = intercept[AccessControlException](
- doAs("someone", sql(s"CREATE DATABASE IF NOT EXISTS $catalogV2.$namespace2").explain()))
+ doAs(someone, sql(s"CREATE DATABASE IF NOT EXISTS $catalogV2.$namespace2").explain()))
assert(e1.getMessage.contains(s"does not have [create] privilege" +
s" on [$namespace2]"))
}
@@ -92,7 +93,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// create database
val e1 = intercept[AccessControlException](
- doAs("someone", sql(s"DROP DATABASE IF EXISTS $catalogV2.$namespace2").explain()))
+ doAs(someone, sql(s"DROP DATABASE IF EXISTS $catalogV2.$namespace2").explain()))
assert(e1.getMessage.contains(s"does not have [drop] privilege" +
s" on [$namespace2]"))
}
@@ -102,7 +103,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// select
val e1 = intercept[AccessControlException](
- doAs("someone", sql(s"select city, id from $catalogV2.$namespace1.$table1").explain()))
+ doAs(someone, sql(s"select city, id from $catalogV2.$namespace1.$table1").explain()))
assert(e1.getMessage.contains(s"does not have [select] privilege" +
s" on [$namespace1/$table1/city]"))
}
@@ -110,7 +111,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
test("[KYUUBI #4255] DESCRIBE TABLE") {
assume(isSparkV31OrGreater)
val e1 = intercept[AccessControlException](
- doAs("someone", sql(s"DESCRIBE TABLE $catalogV2.$namespace1.$table1").explain()))
+ doAs(someone, sql(s"DESCRIBE TABLE $catalogV2.$namespace1.$table1").explain()))
assert(e1.getMessage.contains(s"does not have [select] privilege" +
s" on [$namespace1/$table1]"))
}
@@ -120,14 +121,14 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// CreateTable
val e2 = intercept[AccessControlException](
- doAs("someone", sql(s"CREATE TABLE IF NOT EXISTS $catalogV2.$namespace1.$table2")))
+ doAs(someone, sql(s"CREATE TABLE IF NOT EXISTS $catalogV2.$namespace1.$table2")))
assert(e2.getMessage.contains(s"does not have [create] privilege" +
s" on [$namespace1/$table2]"))
// CreateTableAsSelect
val e21 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"CREATE TABLE IF NOT EXISTS $catalogV2.$namespace1.$table2" +
s" AS select * from $catalogV2.$namespace1.$table1")))
assert(e21.getMessage.contains(s"does not have [select] privilege" +
@@ -139,7 +140,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// DropTable
val e3 = intercept[AccessControlException](
- doAs("someone", sql(s"DROP TABLE $catalogV2.$namespace1.$table1")))
+ doAs(someone, sql(s"DROP TABLE $catalogV2.$namespace1.$table1")))
assert(e3.getMessage.contains(s"does not have [drop] privilege" +
s" on [$namespace1/$table1]"))
}
@@ -150,7 +151,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// AppendData: Insert Using a VALUES Clause
val e4 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"INSERT INTO $catalogV2.$namespace1.$outputTable1 (id, name, city)" +
s" VALUES (1, 'bowenliang123', 'Guangzhou')")))
assert(e4.getMessage.contains(s"does not have [update] privilege" +
@@ -159,7 +160,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// AppendData: Insert Using a TABLE Statement
val e42 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"INSERT INTO $catalogV2.$namespace1.$outputTable1 (id, name, city)" +
s" TABLE $catalogV2.$namespace1.$table1")))
assert(e42.getMessage.contains(s"does not have [select] privilege" +
@@ -168,7 +169,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// AppendData: Insert Using a SELECT Statement
val e43 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"INSERT INTO $catalogV2.$namespace1.$outputTable1 (id, name, city)" +
s" SELECT * from $catalogV2.$namespace1.$table1")))
assert(e43.getMessage.contains(s"does not have [select] privilege" +
@@ -177,7 +178,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// OverwriteByExpression: Insert Overwrite
val e44 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"INSERT OVERWRITE $catalogV2.$namespace1.$outputTable1 (id, name, city)" +
s" VALUES (1, 'bowenliang123', 'Guangzhou')")))
assert(e44.getMessage.contains(s"does not have [update] privilege" +
@@ -199,27 +200,20 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// MergeIntoTable: Using a MERGE INTO Statement
val e1 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(mergeIntoSql)))
assert(e1.getMessage.contains(s"does not have [select] privilege" +
s" on [$namespace1/$table1/id]"))
- try {
- SparkRangerAdminPlugin.getRangerConf.setBoolean(
- s"ranger.plugin.${SparkRangerAdminPlugin.getServiceType}.authorize.in.single.call",
- true)
+ withSingleCallEnabled {
val e2 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(mergeIntoSql)))
assert(e2.getMessage.contains(s"does not have" +
s" [select] privilege" +
s" on [$namespace1/$table1/id,$namespace1/table1/name,$namespace1/$table1/city]," +
s" [update] privilege on [$namespace1/$outputTable1]"))
- } finally {
- SparkRangerAdminPlugin.getRangerConf.setBoolean(
- s"ranger.plugin.${SparkRangerAdminPlugin.getServiceType}.authorize.in.single.call",
- false)
}
}
@@ -229,7 +223,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// UpdateTable
val e5 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"UPDATE $catalogV2.$namespace1.$table1 SET city='Hangzhou' " +
" WHERE id=1")))
assert(e5.getMessage.contains(s"does not have [update] privilege" +
@@ -241,7 +235,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// DeleteFromTable
val e6 = intercept[AccessControlException](
- doAs("someone", sql(s"DELETE FROM $catalogV2.$namespace1.$table1 WHERE id=1")))
+ doAs(someone, sql(s"DELETE FROM $catalogV2.$namespace1.$table1 WHERE id=1")))
assert(e6.getMessage.contains(s"does not have [update] privilege" +
s" on [$namespace1/$table1]"))
}
@@ -252,7 +246,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// CacheTable
val e7 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"CACHE TABLE $cacheTable1" +
s" AS select * from $catalogV2.$namespace1.$table1")))
if (isSparkV32OrGreater) {
@@ -269,7 +263,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
val e1 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"TRUNCATE TABLE $catalogV2.$namespace1.$table1")))
assert(e1.getMessage.contains(s"does not have [update] privilege" +
s" on [$namespace1/$table1]"))
@@ -280,7 +274,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
val e1 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"MSCK REPAIR TABLE $catalogV2.$namespace1.$table1")))
assert(e1.getMessage.contains(s"does not have [alter] privilege" +
s" on [$namespace1/$table1]"))
@@ -292,7 +286,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// AddColumns
val e61 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"ALTER TABLE $catalogV2.$namespace1.$table1 ADD COLUMNS (age int) ").explain()))
assert(e61.getMessage.contains(s"does not have [alter] privilege" +
s" on [$namespace1/$table1]"))
@@ -300,7 +294,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// DropColumns
val e62 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"ALTER TABLE $catalogV2.$namespace1.$table1 DROP COLUMNS city ").explain()))
assert(e62.getMessage.contains(s"does not have [alter] privilege" +
s" on [$namespace1/$table1]"))
@@ -308,7 +302,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// RenameColumn
val e63 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"ALTER TABLE $catalogV2.$namespace1.$table1 RENAME COLUMN city TO city2 ").explain()))
assert(e63.getMessage.contains(s"does not have [alter] privilege" +
s" on [$namespace1/$table1]"))
@@ -316,7 +310,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// AlterColumn
val e64 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"ALTER TABLE $catalogV2.$namespace1.$table1 " +
s"ALTER COLUMN city COMMENT 'city' ")))
assert(e64.getMessage.contains(s"does not have [alter] privilege" +
@@ -329,7 +323,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// CommentOnNamespace
val e1 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"COMMENT ON DATABASE $catalogV2.$namespace1 IS 'xYz' ").explain()))
assert(e1.getMessage.contains(s"does not have [alter] privilege" +
s" on [$namespace1]"))
@@ -337,7 +331,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// CommentOnNamespace
val e2 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"COMMENT ON NAMESPACE $catalogV2.$namespace1 IS 'xYz' ").explain()))
assert(e2.getMessage.contains(s"does not have [alter] privilege" +
s" on [$namespace1]"))
@@ -345,7 +339,7 @@ class V2JdbcTableCatalogRangerSparkExtensionSuite extends RangerSparkExtensionSu
// CommentOnTable
val e3 = intercept[AccessControlException](
doAs(
- "someone",
+ someone,
sql(s"COMMENT ON TABLE $catalogV2.$namespace1.$table1 IS 'xYz' ").explain()))
assert(e3.getMessage.contains(s"does not have [alter] privilege" +
s" on [$namespace1/$table1]"))
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingForIcebergSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingForIcebergSuite.scala
index 99b7eb97300..905cd428cab 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingForIcebergSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingForIcebergSuite.scala
@@ -21,6 +21,7 @@ import org.apache.spark.SparkConf
import org.scalatest.Outcome
import org.apache.kyuubi.Utils
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
class DataMaskingForIcebergSuite extends DataMaskingTestBase {
override protected val extraSparkConf: SparkConf = {
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingForJDBCV2Suite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingForJDBCV2Suite.scala
index 894daeaf711..f74092d0b45 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingForJDBCV2Suite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingForJDBCV2Suite.scala
@@ -23,6 +23,8 @@ import scala.util.Try
import org.apache.spark.SparkConf
import org.scalatest.Outcome
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+
class DataMaskingForJDBCV2Suite extends DataMaskingTestBase {
override protected val extraSparkConf: SparkConf = {
val conf = new SparkConf()
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingTestBase.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingTestBase.scala
index 3585397c6fa..af87a39a0af 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingTestBase.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/datamasking/DataMaskingTestBase.scala
@@ -17,18 +17,20 @@
package org.apache.kyuubi.plugin.spark.authz.ranger.datamasking
-// scalastyle:off
import java.sql.Timestamp
import scala.util.Try
+// scalastyle:off
import org.apache.commons.codec.digest.DigestUtils.md5Hex
import org.apache.spark.sql.{Row, SparkSessionExtensions}
import org.scalatest.BeforeAndAfterAll
import org.scalatest.funsuite.AnyFunSuite
+import org.apache.kyuubi.plugin.spark.authz.RangerTestUsers._
import org.apache.kyuubi.plugin.spark.authz.SparkSessionProvider
import org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
/**
* Base trait for data masking tests, derivative classes shall name themselves following:
@@ -55,6 +57,17 @@ trait DataMaskingTestBase extends AnyFunSuite with SparkSessionProvider with Bef
"SELECT 20, 2, 'kyuubi', 'y', timestamp'2018-11-17 12:34:56', 'world'")
sql("INSERT INTO default.src " +
"SELECT 30, 3, 'spark', 'a', timestamp'2018-11-17 12:34:56', 'world'")
+
+ // scalastyle:off
+ val value1 = "hello WORD 123 ~!@# AßþΔЙקم๗ቐあア叶葉엽"
+ val value2 = "AßþΔЙקم๗ቐあア叶葉엽 hello WORD 123 ~!@#"
+ // AßþΔЙקم๗ቐあア叶葉엽 reference https://zh.wikipedia.org/zh-cn/Unicode#XML.E5.92.8CUnicode
+ // scalastyle:on
+ sql(s"INSERT INTO default.src " +
+ s"SELECT 10, 4, '$value1', '$value1', timestamp'2018-11-17 12:34:56', '$value1'")
+ sql("INSERT INTO default.src " +
+ s"SELECT 11, 5, '$value2', '$value2', timestamp'2018-11-17 12:34:56', '$value2'")
+
sql(s"CREATE TABLE default.unmasked $format AS SELECT * FROM default.src")
}
@@ -64,41 +77,49 @@ trait DataMaskingTestBase extends AnyFunSuite with SparkSessionProvider with Bef
}
override def beforeAll(): Unit = {
- doAs("admin", setup())
+ doAs(admin, setup())
super.beforeAll()
}
override def afterAll(): Unit = {
- doAs("admin", cleanup())
+ doAs(admin, cleanup())
spark.stop
super.afterAll()
}
test("simple query with a user doesn't have mask rules") {
- checkAnswer("kent", "SELECT key FROM default.src order by key", Seq(Row(1), Row(20), Row(30)))
+ checkAnswer(
+ kent,
+ "SELECT key FROM default.src order by key",
+ Seq(Row(1), Row(10), Row(11), Row(20), Row(30)))
}
test("simple query with a user has mask rules") {
val result =
Seq(Row(md5Hex("1"), "xxxxx", "worlx", Timestamp.valueOf("2018-01-01 00:00:00"), "Xorld"))
- checkAnswer("bob", "SELECT value1, value2, value3, value4, value5 FROM default.src", result)
checkAnswer(
- "bob",
- "SELECT value1 as key, value2, value3, value4, value5 FROM default.src",
+ bob,
+ "SELECT value1, value2, value3, value4, value5 FROM default.src " +
+ "where key = 1",
+ result)
+ checkAnswer(
+ bob,
+ "SELECT value1 as key, value2, value3, value4, value5 FROM default.src where key = 1",
result)
}
test("star") {
val result =
Seq(Row(1, md5Hex("1"), "xxxxx", "worlx", Timestamp.valueOf("2018-01-01 00:00:00"), "Xorld"))
- checkAnswer("bob", "SELECT * FROM default.src", result)
+ checkAnswer(bob, "SELECT * FROM default.src where key = 1", result)
}
test("simple udf") {
val result =
Seq(Row(md5Hex("1"), "xxxxx", "worlx", Timestamp.valueOf("2018-01-01 00:00:00"), "Xorld"))
checkAnswer(
- "bob",
- "SELECT max(value1), max(value2), max(value3), max(value4), max(value5) FROM default.src",
+ bob,
+ "SELECT max(value1), max(value2), max(value3), max(value4), max(value5) FROM default.src" +
+ " where key = 1",
result)
}
@@ -106,10 +127,10 @@ trait DataMaskingTestBase extends AnyFunSuite with SparkSessionProvider with Bef
val result =
Seq(Row(md5Hex("1"), "xxxxx", "worlx", Timestamp.valueOf("2018-01-01 00:00:00"), "Xorld"))
checkAnswer(
- "bob",
+ bob,
"SELECT coalesce(max(value1), 1), coalesce(max(value2), 1), coalesce(max(value3), 1), " +
"coalesce(max(value4), timestamp '2018-01-01 22:33:44'), coalesce(max(value5), 1) " +
- "FROM default.src",
+ "FROM default.src where key = 1",
result)
}
@@ -117,53 +138,68 @@ trait DataMaskingTestBase extends AnyFunSuite with SparkSessionProvider with Bef
val result =
Seq(Row(md5Hex("1"), "xxxxx", "worlx", Timestamp.valueOf("2018-01-01 00:00:00"), "Xorld"))
checkAnswer(
- "bob",
+ bob,
"SELECT value1, value2, value3, value4, value5 FROM default.src WHERE value2 in " +
- "(SELECT value2 as key FROM default.src)",
+ "(SELECT value2 as key FROM default.src where key = 1)",
result)
}
test("create a unmasked table as select from a masked one") {
withCleanTmpResources(Seq(("default.src2", "table"))) {
- doAs("bob", sql(s"CREATE TABLE default.src2 $format AS SELECT value1 FROM default.src"))
- checkAnswer("bob", "SELECT value1 FROM default.src2", Seq(Row(md5Hex("1"))))
+ doAs(
+ bob,
+ sql(s"CREATE TABLE default.src2 $format AS SELECT value1 FROM default.src " +
+ s"where key = 1"))
+ checkAnswer(bob, "SELECT value1 FROM default.src2", Seq(Row(md5Hex("1"))))
}
}
test("insert into a unmasked table from a masked one") {
withCleanTmpResources(Seq(("default.src2", "table"), ("default.src3", "table"))) {
- doAs("bob", sql(s"CREATE TABLE default.src2 (value1 string) $format"))
- doAs("bob", sql(s"INSERT INTO default.src2 SELECT value1 from default.src"))
- doAs("bob", sql(s"INSERT INTO default.src2 SELECT value1 as v from default.src"))
- checkAnswer("bob", "SELECT value1 FROM default.src2", Seq(Row(md5Hex("1")), Row(md5Hex("1"))))
- doAs("bob", sql(s"CREATE TABLE default.src3 (k int, value string) $format"))
- doAs("bob", sql(s"INSERT INTO default.src3 SELECT key, value1 from default.src"))
- doAs("bob", sql(s"INSERT INTO default.src3 SELECT key, value1 as v from default.src"))
- checkAnswer("bob", "SELECT value FROM default.src3", Seq(Row(md5Hex("1")), Row(md5Hex("1"))))
+ doAs(bob, sql(s"CREATE TABLE default.src2 (value1 string) $format"))
+ doAs(
+ bob,
+ sql(s"INSERT INTO default.src2 SELECT value1 from default.src " +
+ s"where key = 1"))
+ doAs(
+ bob,
+ sql(s"INSERT INTO default.src2 SELECT value1 as v from default.src " +
+ s"where key = 1"))
+ checkAnswer(bob, "SELECT value1 FROM default.src2", Seq(Row(md5Hex("1")), Row(md5Hex("1"))))
+ doAs(bob, sql(s"CREATE TABLE default.src3 (k int, value string) $format"))
+ doAs(
+ bob,
+ sql(s"INSERT INTO default.src3 SELECT key, value1 from default.src " +
+ s"where key = 1"))
+ doAs(
+ bob,
+ sql(s"INSERT INTO default.src3 SELECT key, value1 as v from default.src " +
+ s"where key = 1"))
+ checkAnswer(bob, "SELECT value FROM default.src3", Seq(Row(md5Hex("1")), Row(md5Hex("1"))))
}
}
test("join on an unmasked table") {
val s = "SELECT a.value1, b.value1 FROM default.src a" +
" join default.unmasked b on a.value1=b.value1"
- checkAnswer("bob", s, Nil)
- checkAnswer("bob", s, Nil) // just for testing query multiple times, don't delete it
+ checkAnswer(bob, s, Nil)
+ checkAnswer(bob, s, Nil) // just for testing query multiple times, don't delete it
}
test("self join on a masked table") {
val s = "SELECT a.value1, b.value1 FROM default.src a" +
- " join default.src b on a.value1=b.value1"
- checkAnswer("bob", s, Seq(Row(md5Hex("1"), md5Hex("1"))))
+ " join default.src b on a.value1=b.value1 where a.key = 1 and b.key = 1 "
+ checkAnswer(bob, s, Seq(Row(md5Hex("1"), md5Hex("1"))))
// just for testing query multiple times, don't delete it
- checkAnswer("bob", s, Seq(Row(md5Hex("1"), md5Hex("1"))))
+ checkAnswer(bob, s, Seq(Row(md5Hex("1"), md5Hex("1"))))
}
test("self join on a masked table and filter the masked column with original value") {
val s = "SELECT a.value1, b.value1 FROM default.src a" +
" join default.src b on a.value1=b.value1" +
" where a.value1='1' and b.value1='1'"
- checkAnswer("bob", s, Nil)
- checkAnswer("bob", s, Nil) // just for testing query multiple times, don't delete it
+ checkAnswer(bob, s, Nil)
+ checkAnswer(bob, s, Nil) // just for testing query multiple times, don't delete it
}
test("self join on a masked table and filter the masked column with masked value") {
@@ -211,7 +247,7 @@ trait DataMaskingTestBase extends AnyFunSuite with SparkSessionProvider with Bef
// +- DataMaskingStage0Marker Relation default.src[key#60,value1#61,value2#62,value3#63,value4#64,value5#65] parquet
// +- Project [key#153, md5(cast(cast(value1#154 as string) as binary)) AS value1#148, regexp_replace(regexp_replace(regexp_replace(value2#155, [A-Z], X, 1), [a-z], x, 1), [0-9], n, 1) AS value2#149, regexp_replace(regexp_replace(regexp_replace(value3#156, [A-Z], X, 5), [a-z], x, 5), [0-9], n, 5) AS value3#150, date_trunc(YEAR, value4#157, Some(Asia/Shanghai)) AS value4#151, concat(regexp_replace(regexp_replace(regexp_replace(left(value5#158, (length(value5#158) - 4)), [A-Z], X, 1), [a-z], x, 1), [0-9], n, 1), right(value5#158, 4)) AS value5#152]
// +- Relation default.src[key#153,value1#154,value2#155,value3#156,value4#157,value5#158] parquet
- // checkAnswer("bob", s, Seq(Row(md5Hex("1"), md5Hex("1"))))
+ // checkAnswer(bob, s, Seq(Row(md5Hex("1"), md5Hex("1"))))
//
//
// scalastyle:on
@@ -220,44 +256,74 @@ trait DataMaskingTestBase extends AnyFunSuite with SparkSessionProvider with Bef
val s2 = "SELECT a.value1, b.value1 FROM default.src a" +
" join default.src b on a.value1=b.value1" +
s" where a.value2='xxxxx' and b.value2='xxxxx'"
- checkAnswer("bob", s2, Seq(Row(md5Hex("1"), md5Hex("1"))))
+ checkAnswer(bob, s2, Seq(Row(md5Hex("1"), md5Hex("1"))))
// just for testing query multiple times, don't delete it
- checkAnswer("bob", s2, Seq(Row(md5Hex("1"), md5Hex("1"))))
+ checkAnswer(bob, s2, Seq(Row(md5Hex("1"), md5Hex("1"))))
}
test("union an unmasked table") {
val s = """
SELECT value1 from (
- SELECT a.value1 FROM default.src a
+ SELECT a.value1 FROM default.src a where a.key = 1
union
(SELECT b.value1 FROM default.unmasked b)
) c order by value1
"""
- checkAnswer("bob", s, Seq(Row("1"), Row("2"), Row("3"), Row(md5Hex("1"))))
+ checkAnswer(bob, s, Seq(Row("1"), Row("2"), Row("3"), Row("4"), Row("5"), Row(md5Hex("1"))))
}
test("union a masked table") {
- val s = "SELECT a.value1 FROM default.src a union" +
- " (SELECT b.value1 FROM default.src b)"
- checkAnswer("bob", s, Seq(Row(md5Hex("1"))))
+ val s = "SELECT a.value1 FROM default.src a where a.key = 1 union" +
+ " (SELECT b.value1 FROM default.src b where b.key = 1)"
+ checkAnswer(bob, s, Seq(Row(md5Hex("1"))))
}
test("KYUUBI #3581: permanent view should lookup rule on itself not the raw table") {
assume(isSparkV31OrGreater)
val supported = doAs(
- "perm_view_user",
+ permViewUser,
Try(sql("CREATE OR REPLACE VIEW default.perm_view AS SELECT * FROM default.src")).isSuccess)
assume(supported, s"view support for '$format' has not been implemented yet")
withCleanTmpResources(Seq(("default.perm_view", "view"))) {
checkAnswer(
- "perm_view_user",
- "SELECT value1, value2 FROM default.src where key < 20",
+ permViewUser,
+ "SELECT value1, value2 FROM default.src where key = 1",
Seq(Row(1, "hello")))
checkAnswer(
- "perm_view_user",
- "SELECT value1, value2 FROM default.perm_view where key < 20",
+ permViewUser,
+ "SELECT value1, value2 FROM default.perm_view where key = 1",
Seq(Row(md5Hex("1"), "hello")))
}
}
+
+ // This test only includes a small subset of UCS-2 characters.
+ // But in theory, it should work for all characters
+ test("test MASK,MASK_SHOW_FIRST_4,MASK_SHOW_LAST_4 rule with non-English character set") {
+ val s1 = s"SELECT * FROM default.src where key = 10"
+ val s2 = s"SELECT * FROM default.src where key = 11"
+ // scalastyle:off
+ checkAnswer(
+ bob,
+ s1,
+ Seq(Row(
+ 10,
+ md5Hex("4"),
+ "xxxxxUXXXXUnnnUUUUUUXUUUUUUUUUUUUU",
+ "hellxUXXXXUnnnUUUUUUXUUUUUUUUUUUUU",
+ Timestamp.valueOf("2018-01-01 00:00:00"),
+ "xxxxxUXXXXUnnnUUUUUUXUUUUUUUUUア叶葉엽")))
+ checkAnswer(
+ bob,
+ s2,
+ Seq(Row(
+ 11,
+ md5Hex("5"),
+ "XUUUUUUUUUUUUUUxxxxxUXXXXUnnnUUUUU",
+ "AßþΔUUUUUUUUUUUxxxxxUXXXXUnnnUUUUU",
+ Timestamp.valueOf("2018-01-01 00:00:00"),
+ "XUUUUUUUUUUUUUUxxxxxUXXXXUnnnU~!@#")))
+ // scalastyle:on
+ }
+
}
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringForIcebergSuite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringForIcebergSuite.scala
index 2120b195221..a93a69662e5 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringForIcebergSuite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringForIcebergSuite.scala
@@ -21,6 +21,8 @@ import org.apache.spark.SparkConf
import org.scalatest.Outcome
import org.apache.kyuubi.Utils
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+
class RowFilteringForIcebergSuite extends RowFilteringTestBase {
override protected val extraSparkConf: SparkConf = {
val conf = new SparkConf()
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringForJDBCV2Suite.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringForJDBCV2Suite.scala
index cfdb7dadc46..09ae6a008b5 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringForJDBCV2Suite.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringForJDBCV2Suite.scala
@@ -24,6 +24,8 @@ import scala.util.Try
import org.apache.spark.SparkConf
import org.scalatest.Outcome
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
+
class RowFilteringForJDBCV2Suite extends RowFilteringTestBase {
override protected val extraSparkConf: SparkConf = {
val conf = new SparkConf()
diff --git a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringTestBase.scala b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringTestBase.scala
index a73690724e4..8d9561a897e 100644
--- a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringTestBase.scala
+++ b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/rowfiltering/RowFilteringTestBase.scala
@@ -24,8 +24,10 @@ import org.apache.spark.sql.{Row, SparkSessionExtensions}
import org.scalatest.BeforeAndAfterAll
import org.scalatest.funsuite.AnyFunSuite
+import org.apache.kyuubi.plugin.spark.authz.RangerTestUsers._
import org.apache.kyuubi.plugin.spark.authz.SparkSessionProvider
import org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
+import org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils._
/**
* Base trait for row filtering tests, derivative classes shall name themselves following:
@@ -47,72 +49,72 @@ trait RowFilteringTestBase extends AnyFunSuite with SparkSessionProvider with Be
}
override def beforeAll(): Unit = {
- doAs("admin", setup())
+ doAs(admin, setup())
super.beforeAll()
}
override def afterAll(): Unit = {
- doAs("admin", cleanup())
+ doAs(admin, cleanup())
spark.stop
super.afterAll()
}
test("user without row filtering rule") {
checkAnswer(
- "kent",
+ kent,
"SELECT key FROM default.src order order by key",
Seq(Row(1), Row(20), Row(30)))
}
test("simple query projecting filtering column") {
- checkAnswer("bob", "SELECT key FROM default.src", Seq(Row(1)))
+ checkAnswer(bob, "SELECT key FROM default.src", Seq(Row(1)))
}
test("simple query projecting non filtering column") {
- checkAnswer("bob", "SELECT value FROM default.src", Seq(Row(1)))
+ checkAnswer(bob, "SELECT value FROM default.src", Seq(Row(1)))
}
test("simple query projecting non filtering column with udf max") {
- checkAnswer("bob", "SELECT max(value) FROM default.src", Seq(Row(1)))
+ checkAnswer(bob, "SELECT max(value) FROM default.src", Seq(Row(1)))
}
test("simple query projecting non filtering column with udf coalesce") {
- checkAnswer("bob", "SELECT coalesce(max(value), 1) FROM default.src", Seq(Row(1)))
+ checkAnswer(bob, "SELECT coalesce(max(value), 1) FROM default.src", Seq(Row(1)))
}
test("in subquery") {
checkAnswer(
- "bob",
+ bob,
"SELECT value FROM default.src WHERE value in (SELECT value as key FROM default.src)",
Seq(Row(1)))
}
test("ctas") {
withCleanTmpResources(Seq(("default.src2", "table"))) {
- doAs("bob", sql(s"CREATE TABLE default.src2 $format AS SELECT value FROM default.src"))
+ doAs(bob, sql(s"CREATE TABLE default.src2 $format AS SELECT value FROM default.src"))
val query = "select value from default.src2"
- checkAnswer("admin", query, Seq(Row(1)))
- checkAnswer("bob", query, Seq(Row(1)))
+ checkAnswer(admin, query, Seq(Row(1)))
+ checkAnswer(bob, query, Seq(Row(1)))
}
}
test("[KYUUBI #3581]: row level filter on permanent view") {
assume(isSparkV31OrGreater)
val supported = doAs(
- "perm_view_user",
+ permViewUser,
Try(sql("CREATE OR REPLACE VIEW default.perm_view AS SELECT * FROM default.src")).isSuccess)
assume(supported, s"view support for '$format' has not been implemented yet")
withCleanTmpResources(Seq((s"default.perm_view", "view"))) {
checkAnswer(
- "admin",
+ admin,
"SELECT key FROM default.perm_view order order by key",
Seq(Row(1), Row(20), Row(30)))
- checkAnswer("bob", "SELECT key FROM default.perm_view", Seq(Row(1)))
- checkAnswer("bob", "SELECT value FROM default.perm_view", Seq(Row(1)))
- checkAnswer("bob", "SELECT max(value) FROM default.perm_view", Seq(Row(1)))
- checkAnswer("bob", "SELECT coalesce(max(value), 1) FROM default.perm_view", Seq(Row(1)))
+ checkAnswer(bob, "SELECT key FROM default.perm_view", Seq(Row(1)))
+ checkAnswer(bob, "SELECT value FROM default.perm_view", Seq(Row(1)))
+ checkAnswer(bob, "SELECT max(value) FROM default.perm_view", Seq(Row(1)))
+ checkAnswer(bob, "SELECT coalesce(max(value), 1) FROM default.perm_view", Seq(Row(1)))
checkAnswer(
- "bob",
+ bob,
"SELECT value FROM default.perm_view WHERE value in " +
"(SELECT value as key FROM default.perm_view)",
Seq(Row(1)))
diff --git a/extensions/spark/kyuubi-spark-connector-common/pom.xml b/extensions/spark/kyuubi-spark-connector-common/pom.xml
index 1cba0ccdd4b..1fc0f57684e 100644
--- a/extensions/spark/kyuubi-spark-connector-common/pom.xml
+++ b/extensions/spark/kyuubi-spark-connector-common/pom.xml
@@ -21,16 +21,22 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-spark-connector-common_2.12
+ kyuubi-spark-connector-common_${scala.binary.version}jarKyuubi Spark Connector Commonhttps://kyuubi.apache.org/
+
+ org.apache.kyuubi
+ kyuubi-util-scala_${scala.binary.version}
+ ${project.version}
+
+
org.scala-langscala-library
@@ -87,10 +93,21 @@
scalacheck-1-17_${scala.binary.version}test
+
+
+ org.apache.logging.log4j
+ log4j-1.2-api
+ test
+
+
+
+ org.apache.logging.log4j
+ log4j-slf4j-impl
+ test
+
-
org.apache.maven.plugins
diff --git a/extensions/spark/kyuubi-spark-connector-common/src/main/scala/org/apache/kyuubi/spark/connector/common/SemanticVersion.scala b/extensions/spark/kyuubi-spark-connector-common/src/main/scala/org/apache/kyuubi/spark/connector/common/SemanticVersion.scala
deleted file mode 100644
index 200937ca664..00000000000
--- a/extensions/spark/kyuubi-spark-connector-common/src/main/scala/org/apache/kyuubi/spark/connector/common/SemanticVersion.scala
+++ /dev/null
@@ -1,74 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.spark.connector.common
-
-/**
- * Encapsulate a component Spark version for the convenience of version checks.
- * Copy from org.apache.kyuubi.engine.ComponentVersion
- */
-case class SemanticVersion(majorVersion: Int, minorVersion: Int) {
-
- def isVersionAtMost(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- (runtimeMajor < targetMajor) || {
- runtimeMajor == targetMajor && runtimeMinor <= targetMinor
- })
- }
-
- def isVersionAtLeast(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- (runtimeMajor > targetMajor) || {
- runtimeMajor == targetMajor && runtimeMinor >= targetMinor
- })
- }
-
- def isVersionEqualTo(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- runtimeMajor == targetMajor && runtimeMinor == targetMinor)
- }
-
- def compareVersion(
- targetVersionString: String,
- callback: (Int, Int, Int, Int) => Boolean): Boolean = {
- val targetVersion = SemanticVersion(targetVersionString)
- val targetMajor = targetVersion.majorVersion
- val targetMinor = targetVersion.minorVersion
- callback(targetMajor, targetMinor, this.majorVersion, this.minorVersion)
- }
-
- override def toString: String = s"$majorVersion.$minorVersion"
-}
-
-object SemanticVersion {
-
- def apply(versionString: String): SemanticVersion = {
- """^(\d+)\.(\d+)(\..*)?$""".r.findFirstMatchIn(versionString) match {
- case Some(m) =>
- SemanticVersion(m.group(1).toInt, m.group(2).toInt)
- case None =>
- throw new IllegalArgumentException(s"Tried to parse '$versionString' as a project" +
- s" version string, but it could not find the major and minor version numbers.")
- }
- }
-}
diff --git a/extensions/spark/kyuubi-spark-connector-common/src/main/scala/org/apache/kyuubi/spark/connector/common/SparkUtils.scala b/extensions/spark/kyuubi-spark-connector-common/src/main/scala/org/apache/kyuubi/spark/connector/common/SparkUtils.scala
index c1a659fbf6e..fcb99ebe6a9 100644
--- a/extensions/spark/kyuubi-spark-connector-common/src/main/scala/org/apache/kyuubi/spark/connector/common/SparkUtils.scala
+++ b/extensions/spark/kyuubi-spark-connector-common/src/main/scala/org/apache/kyuubi/spark/connector/common/SparkUtils.scala
@@ -19,17 +19,8 @@ package org.apache.kyuubi.spark.connector.common
import org.apache.spark.SPARK_VERSION
-object SparkUtils {
-
- def isSparkVersionAtMost(targetVersionString: String): Boolean = {
- SemanticVersion(SPARK_VERSION).isVersionAtMost(targetVersionString)
- }
+import org.apache.kyuubi.util.SemanticVersion
- def isSparkVersionAtLeast(targetVersionString: String): Boolean = {
- SemanticVersion(SPARK_VERSION).isVersionAtLeast(targetVersionString)
- }
-
- def isSparkVersionEqualTo(targetVersionString: String): Boolean = {
- SemanticVersion(SPARK_VERSION).isVersionEqualTo(targetVersionString)
- }
+object SparkUtils {
+ lazy val SPARK_RUNTIME_VERSION: SemanticVersion = SemanticVersion(SPARK_VERSION)
}
diff --git a/extensions/spark/kyuubi-spark-connector-hive/pom.xml b/extensions/spark/kyuubi-spark-connector-hive/pom.xml
index b75db929d50..4f46138e904 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/pom.xml
+++ b/extensions/spark/kyuubi-spark-connector-hive/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-spark-connector-hive_2.12
+ kyuubi-spark-connector-hive_${scala.binary.version}jarKyuubi Spark Hive ConnectorA Kyuubi hive connector based on Spark V2 DataSource
@@ -153,7 +153,7 @@
com.google.guava:guava
- org.apache.kyuubi:kyuubi-spark-connector-common_${scala.binary.version}
+ org.apache.kyuubi:*
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala
new file mode 100644
index 00000000000..615093186a7
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala
@@ -0,0 +1,282 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.spark.connector.hive
+
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTablePartition}
+import org.apache.spark.sql.connector.catalog.TableChange
+import org.apache.spark.sql.connector.catalog.TableChange.{AddColumn, After, ColumnPosition, DeleteColumn, First, RenameColumn, UpdateColumnComment, UpdateColumnNullability, UpdateColumnPosition, UpdateColumnType}
+import org.apache.spark.sql.execution.command.CommandUtils
+import org.apache.spark.sql.execution.command.CommandUtils.{calculateMultipleLocationSizes, calculateSingleLocationSize}
+import org.apache.spark.sql.execution.datasources.PartitionedFile
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{ArrayType, MapType, StructField, StructType}
+
+import org.apache.kyuubi.spark.connector.common.SparkUtils.SPARK_RUNTIME_VERSION
+import org.apache.kyuubi.util.reflect.ReflectUtils.invokeAs
+
+object HiveConnectorUtils extends Logging {
+
+ def partitionedFilePath(file: PartitionedFile): String = {
+ if (SPARK_RUNTIME_VERSION >= "3.4") {
+ invokeAs[String](file, "urlEncodedPath")
+ } else if (SPARK_RUNTIME_VERSION >= "3.3") {
+ invokeAs[String](file, "filePath")
+ } else {
+ throw KyuubiHiveConnectorException(s"Spark version $SPARK_VERSION " +
+ s"is not supported by Kyuubi spark hive connector.")
+ }
+ }
+
+ def calculateTotalSize(
+ spark: SparkSession,
+ catalogTable: CatalogTable,
+ hiveTableCatalog: HiveTableCatalog): (BigInt, Seq[CatalogTablePartition]) = {
+ val sessionState = spark.sessionState
+ val startTime = System.nanoTime()
+ val (totalSize, newPartitions) = if (catalogTable.partitionColumnNames.isEmpty) {
+ (
+ calculateSingleLocationSize(
+ sessionState,
+ catalogTable.identifier,
+ catalogTable.storage.locationUri),
+ Seq())
+ } else {
+ // Calculate table size as a sum of the visible partitions. See SPARK-21079
+ val partitions = hiveTableCatalog.listPartitions(catalogTable.identifier)
+ logInfo(s"Starting to calculate sizes for ${partitions.length} partitions.")
+ val paths = partitions.map(_.storage.locationUri)
+ val sizes = calculateMultipleLocationSizes(spark, catalogTable.identifier, paths)
+ val newPartitions = partitions.zipWithIndex.flatMap { case (p, idx) =>
+ val newStats = CommandUtils.compareAndGetNewStats(p.stats, sizes(idx), None)
+ newStats.map(_ => p.copy(stats = newStats))
+ }
+ (sizes.sum, newPartitions)
+ }
+ logInfo(s"It took ${(System.nanoTime() - startTime) / (1000 * 1000)} ms to calculate" +
+ s" the total size for table ${catalogTable.identifier}.")
+ (totalSize, newPartitions)
+ }
+
+ def applySchemaChanges(schema: StructType, changes: Seq[TableChange]): StructType = {
+ changes.foldLeft(schema) { (schema, change) =>
+ change match {
+ case add: AddColumn =>
+ add.fieldNames match {
+ case Array(name) =>
+ val field = StructField(name, add.dataType, nullable = add.isNullable)
+ val newField = Option(add.comment).map(field.withComment).getOrElse(field)
+ addField(schema, newField, add.position())
+
+ case names =>
+ replace(
+ schema,
+ names.init,
+ parent =>
+ parent.dataType match {
+ case parentType: StructType =>
+ val field = StructField(names.last, add.dataType, nullable = add.isNullable)
+ val newField = Option(add.comment).map(field.withComment).getOrElse(field)
+ Some(parent.copy(dataType = addField(parentType, newField, add.position())))
+
+ case _ =>
+ throw new IllegalArgumentException(s"Not a struct: ${names.init.last}")
+ })
+ }
+
+ case rename: RenameColumn =>
+ replace(
+ schema,
+ rename.fieldNames,
+ field =>
+ Some(StructField(rename.newName, field.dataType, field.nullable, field.metadata)))
+
+ case update: UpdateColumnType =>
+ replace(
+ schema,
+ update.fieldNames,
+ field => Some(field.copy(dataType = update.newDataType)))
+
+ case update: UpdateColumnNullability =>
+ replace(
+ schema,
+ update.fieldNames,
+ field => Some(field.copy(nullable = update.nullable)))
+
+ case update: UpdateColumnComment =>
+ replace(
+ schema,
+ update.fieldNames,
+ field => Some(field.withComment(update.newComment)))
+
+ case update: UpdateColumnPosition =>
+ def updateFieldPos(struct: StructType, name: String): StructType = {
+ val oldField = struct.fields.find(_.name == name).getOrElse {
+ throw new IllegalArgumentException("Field not found: " + name)
+ }
+ val withFieldRemoved = StructType(struct.fields.filter(_ != oldField))
+ addField(withFieldRemoved, oldField, update.position())
+ }
+
+ update.fieldNames() match {
+ case Array(name) =>
+ updateFieldPos(schema, name)
+ case names =>
+ replace(
+ schema,
+ names.init,
+ parent =>
+ parent.dataType match {
+ case parentType: StructType =>
+ Some(parent.copy(dataType = updateFieldPos(parentType, names.last)))
+ case _ =>
+ throw new IllegalArgumentException(s"Not a struct: ${names.init.last}")
+ })
+ }
+
+ case delete: DeleteColumn =>
+ replace(schema, delete.fieldNames, _ => None, delete.ifExists)
+
+ case _ =>
+ // ignore non-schema changes
+ schema
+ }
+ }
+ }
+
+ private def addField(
+ schema: StructType,
+ field: StructField,
+ position: ColumnPosition): StructType = {
+ if (position == null) {
+ schema.add(field)
+ } else if (position.isInstanceOf[First]) {
+ StructType(field +: schema.fields)
+ } else {
+ val afterCol = position.asInstanceOf[After].column()
+ val fieldIndex = schema.fields.indexWhere(_.name == afterCol)
+ if (fieldIndex == -1) {
+ throw new IllegalArgumentException("AFTER column not found: " + afterCol)
+ }
+ val (before, after) = schema.fields.splitAt(fieldIndex + 1)
+ StructType(before ++ (field +: after))
+ }
+ }
+
+ private def replace(
+ struct: StructType,
+ fieldNames: Seq[String],
+ update: StructField => Option[StructField],
+ ifExists: Boolean = false): StructType = {
+
+ val posOpt = fieldNames.zipWithIndex.toMap.get(fieldNames.head)
+ if (posOpt.isEmpty) {
+ if (ifExists) {
+ // We couldn't find the column to replace, but with IF EXISTS, we will silence the error
+ // Currently only DROP COLUMN may pass down the IF EXISTS parameter
+ return struct
+ } else {
+ throw new IllegalArgumentException(s"Cannot find field: ${fieldNames.head}")
+ }
+ }
+
+ val pos = posOpt.get
+ val field = struct.fields(pos)
+ val replacement: Option[StructField] = (fieldNames.tail, field.dataType) match {
+ case (Seq(), _) =>
+ update(field)
+
+ case (names, struct: StructType) =>
+ val updatedType: StructType = replace(struct, names, update, ifExists)
+ Some(StructField(field.name, updatedType, field.nullable, field.metadata))
+
+ case (Seq("key"), map @ MapType(keyType, _, _)) =>
+ val updated = update(StructField("key", keyType, nullable = false))
+ .getOrElse(throw new IllegalArgumentException(s"Cannot delete map key"))
+ Some(field.copy(dataType = map.copy(keyType = updated.dataType)))
+
+ case (Seq("key", names @ _*), map @ MapType(keyStruct: StructType, _, _)) =>
+ Some(field.copy(dataType = map.copy(keyType = replace(keyStruct, names, update, ifExists))))
+
+ case (Seq("value"), map @ MapType(_, mapValueType, isNullable)) =>
+ val updated = update(StructField("value", mapValueType, nullable = isNullable))
+ .getOrElse(throw new IllegalArgumentException(s"Cannot delete map value"))
+ Some(field.copy(dataType = map.copy(
+ valueType = updated.dataType,
+ valueContainsNull = updated.nullable)))
+
+ case (Seq("value", names @ _*), map @ MapType(_, valueStruct: StructType, _)) =>
+ Some(field.copy(dataType = map.copy(valueType =
+ replace(valueStruct, names, update, ifExists))))
+
+ case (Seq("element"), array @ ArrayType(elementType, isNullable)) =>
+ val updated = update(StructField("element", elementType, nullable = isNullable))
+ .getOrElse(throw new IllegalArgumentException(s"Cannot delete array element"))
+ Some(field.copy(dataType = array.copy(
+ elementType = updated.dataType,
+ containsNull = updated.nullable)))
+
+ case (Seq("element", names @ _*), array @ ArrayType(elementStruct: StructType, _)) =>
+ Some(field.copy(dataType = array.copy(elementType =
+ replace(elementStruct, names, update, ifExists))))
+
+ case (names, dataType) =>
+ if (!ifExists) {
+ throw new IllegalArgumentException(
+ s"Cannot find field: ${names.head} in ${dataType.simpleString}")
+ }
+ None
+ }
+
+ val newFields = struct.fields.zipWithIndex.flatMap {
+ case (_, index) if pos == index =>
+ replacement
+ case (other, _) =>
+ Some(other)
+ }
+
+ new StructType(newFields)
+ }
+
+ def withSQLConf[T](pairs: (String, String)*)(f: => T): T = {
+ val conf = SQLConf.get
+ val (keys, values) = pairs.unzip
+ val currentValues = keys.map { key =>
+ if (conf.contains(key)) {
+ Some(conf.getConfString(key))
+ } else {
+ None
+ }
+ }
+ (keys, values).zipped.foreach { (k, v) =>
+ if (SQLConf.isStaticConfigKey(k)) {
+ throw KyuubiHiveConnectorException(s"Cannot modify the value of a static config: $k")
+ }
+ conf.setConfString(k, v)
+ }
+ try f
+ finally {
+ keys.zip(currentValues).foreach {
+ case (key, Some(value)) => conf.setConfString(key, value)
+ case (key, None) => conf.unsetConf(key)
+ }
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveTableCatalog.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveTableCatalog.scala
index d4e0f5ea204..c128d67f1fb 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveTableCatalog.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveTableCatalog.scala
@@ -36,14 +36,16 @@ import org.apache.spark.sql.catalyst.util.quoteIfNeeded
import org.apache.spark.sql.connector.catalog.{Identifier, NamespaceChange, SupportsNamespaces, Table, TableCatalog, TableChange}
import org.apache.spark.sql.connector.catalog.NamespaceChange.RemoveProperty
import org.apache.spark.sql.connector.expressions.Transform
-import org.apache.spark.sql.execution.datasources.DataSource
+import org.apache.spark.sql.execution.command.DDLUtils
import org.apache.spark.sql.hive.HiveUDFExpressionBuilder
import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper._
-import org.apache.spark.sql.internal.StaticSQLConf.CATALOG_IMPLEMENTATION
+import org.apache.spark.sql.internal.{HiveSerDe, SQLConf}
+import org.apache.spark.sql.internal.StaticSQLConf.{CATALOG_IMPLEMENTATION, GLOBAL_TEMP_DATABASE}
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.util.CaseInsensitiveStringMap
-import org.apache.kyuubi.spark.connector.hive.HiveTableCatalog.{toCatalogDatabase, CatalogDatabaseHelper, IdentifierHelper, NamespaceHelper}
+import org.apache.kyuubi.spark.connector.hive.HiveConnectorUtils.withSQLConf
+import org.apache.kyuubi.spark.connector.hive.HiveTableCatalog.{getStorageFormatAndProvider, toCatalogDatabase, CatalogDatabaseHelper, IdentifierHelper, NamespaceHelper}
import org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorDelegationTokenProvider.metastoreTokenSignature
/**
@@ -56,6 +58,8 @@ class HiveTableCatalog(sparkSession: SparkSession)
private val externalCatalogManager = ExternalCatalogManager.getOrCreate(sparkSession)
+ private val LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME = "spark.sql.legacy.v1IdentifierNoCatalog"
+
private val sc = sparkSession.sparkContext
private val sessionState = sparkSession.sessionState
@@ -105,7 +109,7 @@ class HiveTableCatalog(sparkSession: SparkSession)
catalogOptions = options
catalog = new HiveSessionCatalog(
externalCatalogBuilder = () => externalCatalog,
- globalTempViewManagerBuilder = () => sparkSession.sharedState.globalTempViewManager,
+ globalTempViewManagerBuilder = () => globalTempViewManager,
metastoreCatalog = new HiveMetastoreCatalog(sparkSession),
functionRegistry = sessionState.functionRegistry,
tableFunctionRegistry = sessionState.tableFunctionRegistry,
@@ -115,6 +119,17 @@ class HiveTableCatalog(sparkSession: SparkSession)
HiveUDFExpressionBuilder)
}
+ private lazy val globalTempViewManager: GlobalTempViewManager = {
+ val globalTempDB = conf.getConf(GLOBAL_TEMP_DATABASE)
+ if (externalCatalog.databaseExists(globalTempDB)) {
+ throw KyuubiHiveConnectorException(
+ s"$globalTempDB is a system preserved database, please rename your existing database to " +
+ s"resolve the name conflict, or set a different value for ${GLOBAL_TEMP_DATABASE.key}, " +
+ "and launch your Spark application again.")
+ }
+ new GlobalTempViewManager(globalTempDB)
+ }
+
/**
* A catalog that interacts with external systems.
*/
@@ -132,129 +147,139 @@ class HiveTableCatalog(sparkSession: SparkSession)
override val defaultNamespace: Array[String] = Array("default")
- override def listTables(namespace: Array[String]): Array[Identifier] = {
- namespace match {
- case Array(db) =>
- catalog
- .listTables(db)
- .map(ident => Identifier.of(ident.database.map(Array(_)).getOrElse(Array()), ident.table))
- .toArray
- case _ =>
- throw new NoSuchNamespaceException(namespace)
+ override def listTables(namespace: Array[String]): Array[Identifier] =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ namespace match {
+ case Array(db) =>
+ catalog
+ .listTables(db)
+ .map(ident =>
+ Identifier.of(ident.database.map(Array(_)).getOrElse(Array()), ident.table))
+ .toArray
+ case _ =>
+ throw new NoSuchNamespaceException(namespace)
+ }
}
- }
- override def loadTable(ident: Identifier): Table = {
- HiveTable(sparkSession, catalog.getTableMetadata(ident.asTableIdentifier), this)
- }
+ override def loadTable(ident: Identifier): Table =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ HiveTable(sparkSession, catalog.getTableMetadata(ident.asTableIdentifier), this)
+ }
override def createTable(
ident: Identifier,
schema: StructType,
partitions: Array[Transform],
- properties: util.Map[String, String]): Table = {
- import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.TransformHelper
- val (partitionColumns, maybeBucketSpec) = partitions.toSeq.convertTransforms
- val provider = properties.getOrDefault(TableCatalog.PROP_PROVIDER, conf.defaultDataSourceName)
- val tableProperties = properties.asScala
- val location = Option(properties.get(TableCatalog.PROP_LOCATION))
- val storage = DataSource.buildStorageFormatFromOptions(toOptions(tableProperties.toMap))
- .copy(locationUri = location.map(CatalogUtils.stringToURI))
- val isExternal = properties.containsKey(TableCatalog.PROP_EXTERNAL)
- val tableType =
- if (isExternal || location.isDefined) {
- CatalogTableType.EXTERNAL
- } else {
- CatalogTableType.MANAGED
+ properties: util.Map[String, String]): Table =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.TransformHelper
+ val (partitionColumns, maybeBucketSpec) = partitions.toSeq.convertTransforms
+ val location = Option(properties.get(TableCatalog.PROP_LOCATION))
+ val maybeProvider = Option(properties.get(TableCatalog.PROP_PROVIDER))
+ val (storage, provider) =
+ getStorageFormatAndProvider(
+ maybeProvider,
+ location,
+ properties.asScala.toMap)
+ val tableProperties = properties.asScala
+ val isExternal = properties.containsKey(TableCatalog.PROP_EXTERNAL)
+ val tableType =
+ if (isExternal || location.isDefined) {
+ CatalogTableType.EXTERNAL
+ } else {
+ CatalogTableType.MANAGED
+ }
+
+ val tableDesc = CatalogTable(
+ identifier = ident.asTableIdentifier,
+ tableType = tableType,
+ storage = storage,
+ schema = schema,
+ provider = Some(provider),
+ partitionColumnNames = partitionColumns,
+ bucketSpec = maybeBucketSpec,
+ properties = tableProperties.toMap,
+ tracksPartitionsInCatalog = conf.manageFilesourcePartitions,
+ comment = Option(properties.get(TableCatalog.PROP_COMMENT)))
+
+ try {
+ catalog.createTable(tableDesc, ignoreIfExists = false)
+ } catch {
+ case _: TableAlreadyExistsException =>
+ throw new TableAlreadyExistsException(ident)
}
- val tableDesc = CatalogTable(
- identifier = ident.asTableIdentifier,
- tableType = tableType,
- storage = storage,
- schema = schema,
- provider = Some(provider),
- partitionColumnNames = partitionColumns,
- bucketSpec = maybeBucketSpec,
- properties = tableProperties.toMap,
- tracksPartitionsInCatalog = conf.manageFilesourcePartitions,
- comment = Option(properties.get(TableCatalog.PROP_COMMENT)))
-
- try {
- catalog.createTable(tableDesc, ignoreIfExists = false)
- } catch {
- case _: TableAlreadyExistsException =>
- throw new TableAlreadyExistsException(ident)
+ loadTable(ident)
}
- loadTable(ident)
- }
+ override def alterTable(ident: Identifier, changes: TableChange*): Table =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ val catalogTable =
+ try {
+ catalog.getTableMetadata(ident.asTableIdentifier)
+ } catch {
+ case _: NoSuchTableException =>
+ throw new NoSuchTableException(ident)
+ }
+
+ val properties = CatalogV2Util.applyPropertiesChanges(catalogTable.properties, changes)
+ val schema = HiveConnectorUtils.applySchemaChanges(
+ catalogTable.schema,
+ changes)
+ val comment = properties.get(TableCatalog.PROP_COMMENT)
+ val owner = properties.getOrElse(TableCatalog.PROP_OWNER, catalogTable.owner)
+ val location = properties.get(TableCatalog.PROP_LOCATION).map(CatalogUtils.stringToURI)
+ val storage =
+ if (location.isDefined) {
+ catalogTable.storage.copy(locationUri = location)
+ } else {
+ catalogTable.storage
+ }
- override def alterTable(ident: Identifier, changes: TableChange*): Table = {
- val catalogTable =
try {
- catalog.getTableMetadata(ident.asTableIdentifier)
+ catalog.alterTable(
+ catalogTable.copy(
+ properties = properties,
+ schema = schema,
+ owner = owner,
+ comment = comment,
+ storage = storage))
} catch {
case _: NoSuchTableException =>
throw new NoSuchTableException(ident)
}
- val properties = CatalogV2Util.applyPropertiesChanges(catalogTable.properties, changes)
- val schema = CatalogV2Util.applySchemaChanges(
- catalogTable.schema,
- changes)
- val comment = properties.get(TableCatalog.PROP_COMMENT)
- val owner = properties.getOrElse(TableCatalog.PROP_OWNER, catalogTable.owner)
- val location = properties.get(TableCatalog.PROP_LOCATION).map(CatalogUtils.stringToURI)
- val storage =
- if (location.isDefined) {
- catalogTable.storage.copy(locationUri = location)
- } else {
- catalogTable.storage
- }
-
- try {
- catalog.alterTable(
- catalogTable.copy(
- properties = properties,
- schema = schema,
- owner = owner,
- comment = comment,
- storage = storage))
- } catch {
- case _: NoSuchTableException =>
- throw new NoSuchTableException(ident)
+ loadTable(ident)
}
- loadTable(ident)
- }
-
- override def dropTable(ident: Identifier): Boolean = {
- try {
- if (loadTable(ident) != null) {
- catalog.dropTable(
- ident.asTableIdentifier,
- ignoreIfNotExists = true,
- purge = true /* skip HDFS trash */ )
- true
- } else {
- false
+ override def dropTable(ident: Identifier): Boolean =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ try {
+ if (loadTable(ident) != null) {
+ catalog.dropTable(
+ ident.asTableIdentifier,
+ ignoreIfNotExists = true,
+ purge = true /* skip HDFS trash */ )
+ true
+ } else {
+ false
+ }
+ } catch {
+ case _: NoSuchTableException =>
+ false
}
- } catch {
- case _: NoSuchTableException =>
- false
}
- }
- override def renameTable(oldIdent: Identifier, newIdent: Identifier): Unit = {
- if (tableExists(newIdent)) {
- throw new TableAlreadyExistsException(newIdent)
- }
+ override def renameTable(oldIdent: Identifier, newIdent: Identifier): Unit =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ if (tableExists(newIdent)) {
+ throw new TableAlreadyExistsException(newIdent)
+ }
- // Load table to make sure the table exists
- loadTable(oldIdent)
- catalog.renameTable(oldIdent.asTableIdentifier, newIdent.asTableIdentifier)
- }
+ // Load table to make sure the table exists
+ loadTable(oldIdent)
+ catalog.renameTable(oldIdent.asTableIdentifier, newIdent.asTableIdentifier)
+ }
private def toOptions(properties: Map[String, String]): Map[String, String] = {
properties.filterKeys(_.startsWith(TableCatalog.OPTION_PREFIX)).map {
@@ -262,70 +287,78 @@ class HiveTableCatalog(sparkSession: SparkSession)
}.toMap
}
- override def listNamespaces(): Array[Array[String]] = {
- catalog.listDatabases().map(Array(_)).toArray
- }
-
- override def listNamespaces(namespace: Array[String]): Array[Array[String]] = {
- namespace match {
- case Array() =>
- listNamespaces()
- case Array(db) if catalog.databaseExists(db) =>
- Array()
- case _ =>
- throw new NoSuchNamespaceException(namespace)
+ override def listNamespaces(): Array[Array[String]] =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ catalog.listDatabases().map(Array(_)).toArray
}
- }
- override def loadNamespaceMetadata(namespace: Array[String]): util.Map[String, String] = {
- namespace match {
- case Array(db) =>
- try {
- catalog.getDatabaseMetadata(db).toMetadata
- } catch {
- case _: NoSuchDatabaseException =>
- throw new NoSuchNamespaceException(namespace)
- }
+ override def listNamespaces(namespace: Array[String]): Array[Array[String]] =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ namespace match {
+ case Array() =>
+ listNamespaces()
+ case Array(db) if catalog.databaseExists(db) =>
+ Array()
+ case _ =>
+ throw new NoSuchNamespaceException(namespace)
+ }
+ }
- case _ =>
- throw new NoSuchNamespaceException(namespace)
+ override def loadNamespaceMetadata(namespace: Array[String]): util.Map[String, String] =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ namespace match {
+ case Array(db) =>
+ try {
+ catalog.getDatabaseMetadata(db).toMetadata
+ } catch {
+ case _: NoSuchDatabaseException =>
+ throw new NoSuchNamespaceException(namespace)
+ }
+
+ case _ =>
+ throw new NoSuchNamespaceException(namespace)
+ }
}
- }
override def createNamespace(
namespace: Array[String],
- metadata: util.Map[String, String]): Unit = namespace match {
- case Array(db) if !catalog.databaseExists(db) =>
- catalog.createDatabase(
- toCatalogDatabase(db, metadata, defaultLocation = Some(catalog.getDefaultDBPath(db))),
- ignoreIfExists = false)
-
- case Array(_) =>
- throw new NamespaceAlreadyExistsException(namespace)
-
- case _ =>
- throw new IllegalArgumentException(s"Invalid namespace name: ${namespace.quoted}")
- }
-
- override def alterNamespace(namespace: Array[String], changes: NamespaceChange*): Unit = {
- namespace match {
- case Array(db) =>
- // validate that this catalog's reserved properties are not removed
- changes.foreach {
- case remove: RemoveProperty if NAMESPACE_RESERVED_PROPERTIES.contains(remove.property) =>
- throw new UnsupportedOperationException(
- s"Cannot remove reserved property: ${remove.property}")
- case _ =>
- }
-
- val metadata = catalog.getDatabaseMetadata(db).toMetadata
- catalog.alterDatabase(
- toCatalogDatabase(db, CatalogV2Util.applyNamespaceChanges(metadata, changes)))
+ metadata: util.Map[String, String]): Unit =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ namespace match {
+ case Array(db) if !catalog.databaseExists(db) =>
+ catalog.createDatabase(
+ toCatalogDatabase(db, metadata, defaultLocation = Some(catalog.getDefaultDBPath(db))),
+ ignoreIfExists = false)
+
+ case Array(_) =>
+ throw new NamespaceAlreadyExistsException(namespace)
+
+ case _ =>
+ throw new IllegalArgumentException(s"Invalid namespace name: ${namespace.quoted}")
+ }
+ }
- case _ =>
- throw new NoSuchNamespaceException(namespace)
+ override def alterNamespace(namespace: Array[String], changes: NamespaceChange*): Unit =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ namespace match {
+ case Array(db) =>
+ // validate that this catalog's reserved properties are not removed
+ changes.foreach {
+ case remove: RemoveProperty
+ if NAMESPACE_RESERVED_PROPERTIES.contains(remove.property) =>
+ throw new UnsupportedOperationException(
+ s"Cannot remove reserved property: ${remove.property}")
+ case _ =>
+ }
+
+ val metadata = catalog.getDatabaseMetadata(db).toMetadata
+ catalog.alterDatabase(
+ toCatalogDatabase(db, CatalogV2Util.applyNamespaceChanges(metadata, changes)))
+
+ case _ =>
+ throw new NoSuchNamespaceException(namespace)
+ }
}
- }
/**
* List the metadata of partitions that belong to the specified table, assuming it exists, that
@@ -345,24 +378,24 @@ class HiveTableCatalog(sparkSession: SparkSession)
override def dropNamespace(
namespace: Array[String],
- cascade: Boolean): Boolean = namespace match {
- case Array(db) if catalog.databaseExists(db) =>
- if (catalog.listTables(db).nonEmpty && !cascade) {
- throw new IllegalStateException(s"Namespace ${namespace.quoted} is not empty")
+ cascade: Boolean): Boolean =
+ withSQLConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME -> "true") {
+ namespace match {
+ case Array(db) if catalog.databaseExists(db) =>
+ catalog.dropDatabase(db, ignoreIfNotExists = false, cascade)
+ true
+
+ case Array(_) =>
+ // exists returned false
+ false
+
+ case _ =>
+ throw new NoSuchNamespaceException(namespace)
}
- catalog.dropDatabase(db, ignoreIfNotExists = false, cascade)
- true
-
- case Array(_) =>
- // exists returned false
- false
-
- case _ =>
- throw new NoSuchNamespaceException(namespace)
- }
+ }
}
-private object HiveTableCatalog {
+private object HiveTableCatalog extends Logging {
private def toCatalogDatabase(
db: String,
metadata: util.Map[String, String],
@@ -378,6 +411,70 @@ private object HiveTableCatalog {
Seq(SupportsNamespaces.PROP_COMMENT, SupportsNamespaces.PROP_LOCATION))
}
+ private def getStorageFormatAndProvider(
+ provider: Option[String],
+ location: Option[String],
+ options: Map[String, String]): (CatalogStorageFormat, String) = {
+ val nonHiveStorageFormat = CatalogStorageFormat.empty.copy(
+ locationUri = location.map(CatalogUtils.stringToURI),
+ properties = options)
+
+ val conf = SQLConf.get
+ val defaultHiveStorage = HiveSerDe.getDefaultStorage(conf).copy(
+ locationUri = location.map(CatalogUtils.stringToURI),
+ properties = options)
+
+ if (provider.isDefined) {
+ (nonHiveStorageFormat, provider.get)
+ } else if (serdeIsDefined(options)) {
+ val maybeSerde = options.get("hive.serde")
+ val maybeStoredAs = options.get("hive.stored-as")
+ val maybeInputFormat = options.get("hive.input-format")
+ val maybeOutputFormat = options.get("hive.output-format")
+ val storageFormat = if (maybeStoredAs.isDefined) {
+ // If `STORED AS fileFormat` is used, infer inputFormat, outputFormat and serde from it.
+ HiveSerDe.sourceToSerDe(maybeStoredAs.get) match {
+ case Some(hiveSerde) =>
+ defaultHiveStorage.copy(
+ inputFormat = hiveSerde.inputFormat.orElse(defaultHiveStorage.inputFormat),
+ outputFormat = hiveSerde.outputFormat.orElse(defaultHiveStorage.outputFormat),
+ // User specified serde takes precedence over the one inferred from file format.
+ serde = maybeSerde.orElse(hiveSerde.serde).orElse(defaultHiveStorage.serde),
+ properties = options ++ defaultHiveStorage.properties)
+ case _ => throw KyuubiHiveConnectorException(s"Unsupported serde ${maybeSerde.get}.")
+ }
+ } else {
+ defaultHiveStorage.copy(
+ inputFormat =
+ maybeInputFormat.orElse(defaultHiveStorage.inputFormat),
+ outputFormat =
+ maybeOutputFormat.orElse(defaultHiveStorage.outputFormat),
+ serde = maybeSerde.orElse(defaultHiveStorage.serde),
+ properties = options ++ defaultHiveStorage.properties)
+ }
+ (storageFormat, DDLUtils.HIVE_PROVIDER)
+ } else {
+ val createHiveTableByDefault = conf.getConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT)
+ if (!createHiveTableByDefault) {
+ (nonHiveStorageFormat, conf.defaultDataSourceName)
+ } else {
+ logWarning("A Hive serde table will be created as there is no table provider " +
+ s"specified. You can set ${SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT.key} to false " +
+ "so that native data source table will be created instead.")
+ (defaultHiveStorage, DDLUtils.HIVE_PROVIDER)
+ }
+ }
+ }
+
+ private def serdeIsDefined(options: Map[String, String]): Boolean = {
+ val maybeStoredAs = options.get("hive.stored-as")
+ val maybeInputFormat = options.get("hive.input-format")
+ val maybeOutputFormat = options.get("hive.output-format")
+ val maybeSerde = options.get("hive.serde")
+ maybeStoredAs.isDefined || maybeInputFormat.isDefined ||
+ maybeOutputFormat.isDefined || maybeSerde.isDefined
+ }
+
implicit class NamespaceHelper(namespace: Array[String]) {
def quoted: String = namespace.map(quoteIfNeeded).mkString(".")
}
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveFileIndex.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveFileIndex.scala
index 82199e6f27e..0d79621f88a 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveFileIndex.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveFileIndex.scala
@@ -21,15 +21,14 @@ import java.net.URI
import scala.collection.mutable
+import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileStatus, Path}
-import org.apache.hadoop.hive.ql.metadata.{Partition => HivePartition, Table}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.{expressions, InternalRow}
import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTablePartition, ExternalCatalogUtils}
import org.apache.spark.sql.catalyst.expressions.{AttributeReference, BoundReference, Expression, Predicate}
import org.apache.spark.sql.connector.catalog.CatalogPlugin
import org.apache.spark.sql.execution.datasources._
-import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.HiveClientImpl
import org.apache.spark.sql.types.StructType
import org.apache.kyuubi.spark.connector.hive.{HiveTableCatalog, KyuubiHiveConnectorException}
@@ -37,7 +36,7 @@ import org.apache.kyuubi.spark.connector.hive.{HiveTableCatalog, KyuubiHiveConne
class HiveCatalogFileIndex(
sparkSession: SparkSession,
val catalogTable: CatalogTable,
- hiveCatalog: HiveTableCatalog,
+ val hiveCatalog: HiveTableCatalog,
override val sizeInBytes: Long)
extends PartitioningAwareFileIndex(
sparkSession,
@@ -46,18 +45,17 @@ class HiveCatalogFileIndex(
private val table = catalogTable
- private val partPathToBindHivePart: mutable.Map[PartitionPath, HivePartition] = mutable.Map()
+ private val partPathToBindHivePart: mutable.Map[PartitionPath, CatalogTablePartition] =
+ mutable.Map()
private val fileStatusCache = FileStatusCache.getOrCreate(sparkSession)
- private lazy val hiveTable: Table = HiveClientImpl.toHiveTable(table)
-
private val baseLocation: Option[URI] = table.storage.locationUri
override def partitionSchema: StructType = table.partitionSchema
private[hive] def listHiveFiles(partitionFilters: Seq[Expression], dataFilters: Seq[Expression])
- : (Seq[PartitionDirectory], Map[PartitionDirectory, HivePartition]) = {
+ : (Seq[PartitionDirectory], Map[PartitionDirectory, CatalogTablePartition]) = {
val fileIndex = filterPartitions(partitionFilters)
val partDirs = fileIndex.listFiles(partitionFilters, dataFilters)
val partDirToHivePart = fileIndex.partDirToBindHivePartMap()
@@ -78,15 +76,15 @@ class HiveCatalogFileIndex(
}
val partitions = selectedPartitions.map {
- case BindPartition(catalogTablePartition, hivePartition) =>
+ case BindPartition(catalogTablePartition) =>
val path = new Path(catalogTablePartition.location)
- val fs = path.getFileSystem(hadoopConf)
+ val fs = path.getFileSystem(hiveCatalog.hadoopConfiguration())
val partPath = PartitionPath(
catalogTablePartition.toRow(
partitionSchema,
sparkSession.sessionState.conf.sessionLocalTimeZone),
path.makeQualified(fs.getUri, fs.getWorkingDirectory))
- partPathToBindHivePart += (partPath -> hivePartition)
+ partPathToBindHivePart += (partPath -> catalogTablePartition)
partPath
}
val partitionSpec = PartitionSpec(partitionSchema, partitions)
@@ -99,19 +97,21 @@ class HiveCatalogFileIndex(
userSpecifiedSchema = Some(partitionSpec.partitionColumns),
fileStatusCache = fileStatusCache,
userSpecifiedPartitionSpec = Some(partitionSpec),
- metadataOpsTimeNs = Some(timeNs))
+ metadataOpsTimeNs = Some(timeNs),
+ hadoopConf = hiveCatalog.hadoopConfiguration())
} else {
new HiveInMemoryFileIndex(
sparkSession = sparkSession,
rootPathsSpecified = rootPaths,
parameters = table.properties,
userSpecifiedSchema = None,
- fileStatusCache = fileStatusCache)
+ fileStatusCache = fileStatusCache,
+ hadoopConf = hiveCatalog.hadoopConfiguration())
}
}
private def buildBindPartition(partition: CatalogTablePartition): BindPartition =
- BindPartition(partition, HiveClientImpl.toHivePartition(partition, hiveTable))
+ BindPartition(partition)
override def partitionSpec(): PartitionSpec = {
throw notSupportOperator("partitionSpec")
@@ -139,10 +139,11 @@ class HiveInMemoryFileIndex(
rootPathsSpecified: Seq[Path],
parameters: Map[String, String],
userSpecifiedSchema: Option[StructType],
- partPathToBindHivePart: Map[PartitionPath, HivePartition] = Map.empty,
+ partPathToBindHivePart: Map[PartitionPath, CatalogTablePartition] = Map.empty,
fileStatusCache: FileStatusCache = NoopCache,
userSpecifiedPartitionSpec: Option[PartitionSpec] = None,
- override val metadataOpsTimeNs: Option[Long] = None)
+ override val metadataOpsTimeNs: Option[Long] = None,
+ override protected val hadoopConf: Configuration)
extends InMemoryFileIndex(
sparkSession,
rootPathsSpecified,
@@ -152,7 +153,8 @@ class HiveInMemoryFileIndex(
userSpecifiedPartitionSpec,
metadataOpsTimeNs) {
- private val partDirToBindHivePart: mutable.Map[PartitionDirectory, HivePartition] = mutable.Map()
+ private val partDirToBindHivePart: mutable.Map[PartitionDirectory, CatalogTablePartition] =
+ mutable.Map()
override def listFiles(
partitionFilters: Seq[Expression],
@@ -230,7 +232,7 @@ class HiveInMemoryFileIndex(
!((name.startsWith("_") && !name.contains("=")) || name.startsWith("."))
}
- def partDirToBindHivePartMap(): Map[PartitionDirectory, HivePartition] = {
+ def partDirToBindHivePartMap(): Map[PartitionDirectory, CatalogTablePartition] = {
partDirToBindHivePart.toMap
}
@@ -243,7 +245,7 @@ class HiveInMemoryFileIndex(
}
}
-case class BindPartition(catalogTablePartition: CatalogTablePartition, hivePartition: HivePartition)
+case class BindPartition(catalogTablePartition: CatalogTablePartition)
object HiveTableCatalogFileIndex {
implicit class CatalogHelper(plugin: CatalogPlugin) {
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HivePartitionReaderFactory.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HivePartitionReaderFactory.scala
index 6770f414413..6a2a7f1d6ed 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HivePartitionReaderFactory.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HivePartitionReaderFactory.scala
@@ -31,15 +31,18 @@ import org.apache.spark.TaskContext
import org.apache.spark.broadcast.Broadcast
import org.apache.spark.internal.Logging
import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.connector.read.{InputPartition, PartitionReader}
+import org.apache.spark.sql.catalyst.catalog.CatalogTablePartition
+import org.apache.spark.sql.connector.read.{InputPartition, PartitionReader, PartitionReaderFactory}
import org.apache.spark.sql.execution.datasources.{FilePartition, PartitionedFile}
import org.apache.spark.sql.execution.datasources.v2._
-import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.NextIterator
+import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.{HiveClientImpl, NextIterator}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.types._
import org.apache.spark.util.SerializableConfiguration
+import org.apache.kyuubi.spark.connector.hive.HiveConnectorUtils
+
case class HivePartitionReaderFactory(
sqlConf: SQLConf,
broadcastHiveConf: Broadcast[SerializableConfiguration],
@@ -47,9 +50,9 @@ case class HivePartitionReaderFactory(
dataSchema: StructType,
readDataSchema: StructType,
partitionSchema: StructType,
- partFileToHivePart: Map[PartitionedFile, HivePartition],
+ partFileToHivePart: Map[PartitionedFile, CatalogTablePartition],
pushedFilters: Array[Filter] = Array.empty)
- extends FilePartitionReaderFactory with Logging {
+ extends PartitionReaderFactory with Logging {
private val charset: String =
sqlConf.getConfString("hive.exec.default.charset", "utf-8")
@@ -57,37 +60,34 @@ case class HivePartitionReaderFactory(
val tableDesc = HiveReader.getTableDec(hiveTable)
val nonPartitionReadDataKeys = HiveReader.toAttributes(readDataSchema)
- override def buildReader(partitionedFile: PartitionedFile): PartitionReader[InternalRow] = {
- throw new UnsupportedOperationException("Cannot use buildReader directly.")
- }
-
override def createReader(partition: InputPartition): PartitionReader[InternalRow] = {
assert(partition.isInstanceOf[FilePartition])
val filePartition = partition.asInstanceOf[FilePartition]
val iter: Iterator[HivePartitionedFileReader[InternalRow]] =
filePartition.files.toIterator.map { file =>
- val bindHivePart = partFileToHivePart.getOrElse(file, null)
+ val bindHivePart = partFileToHivePart.get(file)
+ val hivePartition = bindHivePart.map(HiveClientImpl.toHivePartition(_, hiveTable))
HivePartitionedFileReader(
file,
new PartitionReaderWithPartitionValues(
HivePartitionedReader(
file,
- buildReaderInternal(file, bindHivePart),
+ buildReaderInternal(file, hivePartition),
tableDesc,
broadcastHiveConf,
nonPartitionReadDataKeys,
- bindHivePart,
+ hivePartition,
charset),
readDataSchema,
partitionSchema,
file.partitionValues))
}
- new FilePartitionReader[InternalRow](iter)
+ new SparkFilePartitionReader[InternalRow](iter)
}
- def buildReaderInternal(
+ private def buildReaderInternal(
file: PartitionedFile,
- bindPartition: HivePartition): PartitionReader[Writable] = {
+ bindPartition: Option[HivePartition]): PartitionReader[Writable] = {
val reader = createPartitionWritableReader(file, bindPartition)
val fileReader = new PartitionReader[Writable] {
override def next(): Boolean = reader.hasNext
@@ -99,25 +99,20 @@ case class HivePartitionReaderFactory(
private def createPartitionWritableReader[T](
file: PartitionedFile,
- bindPartition: HivePartition): Iterator[Writable] = {
+ bindPartition: Option[HivePartition]): Iterator[Writable] = {
// Obtain binding HivePartition from input partitioned file
- val partDesc =
- if (bindPartition != null) {
- Utilities.getPartitionDesc(bindPartition)
- } else null
-
- val ifc =
- if (partDesc == null) {
- hiveTable.getInputFormatClass
- .asInstanceOf[java.lang.Class[InputFormat[Writable, Writable]]]
- } else {
+ val ifc = bindPartition.map(Utilities.getPartitionDesc) match {
+ case Some(partDesc) =>
partDesc.getInputFileFormatClass
.asInstanceOf[java.lang.Class[InputFormat[Writable, Writable]]]
- }
+ case None =>
+ hiveTable.getInputFormatClass
+ .asInstanceOf[java.lang.Class[InputFormat[Writable, Writable]]]
+ }
val jobConf = new JobConf(broadcastHiveConf.value.value)
- val filePath = new Path(new URI(file.filePath))
+ val filePath = new Path(new URI(HiveConnectorUtils.partitionedFilePath(file)))
if (tableDesc != null) {
configureJobPropertiesForStorageHandler(tableDesc, jobConf, true)
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HivePartitionedReader.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HivePartitionedReader.scala
index 4c169052473..732643eb149 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HivePartitionedReader.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HivePartitionedReader.scala
@@ -19,7 +19,6 @@ package org.apache.kyuubi.spark.connector.hive.read
import java.util.Properties
-import org.apache.hadoop.hive.ql.exec.Utilities
import org.apache.hadoop.hive.ql.metadata.{Partition => HivePartition}
import org.apache.hadoop.hive.ql.plan.TableDesc
import org.apache.hadoop.hive.serde2.Deserializer
@@ -43,30 +42,24 @@ case class HivePartitionedReader(
tableDesc: TableDesc,
broadcastHiveConf: Broadcast[SerializableConfiguration],
nonPartitionReadDataKeys: Seq[Attribute],
- bindPartition: HivePartition,
+ bindPartitionOpt: Option[HivePartition],
charset: String = "utf-8") extends PartitionReader[InternalRow] with Logging {
- private val partDesc =
- if (bindPartition != null) {
- Utilities.getPartitionDesc(bindPartition)
- } else null
private val hiveConf = broadcastHiveConf.value.value
private val tableDeser = tableDesc.getDeserializerClass.newInstance()
tableDeser.initialize(hiveConf, tableDesc.getProperties)
- private val localDeser: Deserializer =
- if (bindPartition != null &&
- bindPartition.getDeserializer != null) {
+ private val localDeser: Deserializer = bindPartitionOpt match {
+ case Some(bindPartition) if bindPartition.getDeserializer != null =>
val tableProperties = tableDesc.getProperties
val props = new Properties(tableProperties)
val deserializer =
bindPartition.getDeserializer.getClass.asInstanceOf[Class[Deserializer]].newInstance()
deserializer.initialize(hiveConf, props)
deserializer
- } else {
- tableDeser
- }
+ case _ => tableDeser
+ }
private val internalRow = new SpecificInternalRow(nonPartitionReadDataKeys.map(_.dataType))
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveScan.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveScan.scala
index 64fcf23f889..0b79d730751 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveScan.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveScan.scala
@@ -23,9 +23,8 @@ import scala.collection.mutable
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
-import org.apache.hadoop.hive.ql.metadata.{Partition => HivePartition}
import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTablePartition}
import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Expression}
import org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection
import org.apache.spark.sql.connector.read.PartitionReaderFactory
@@ -37,7 +36,7 @@ import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.types.StructType
import org.apache.spark.util.SerializableConfiguration
-import org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorException
+import org.apache.kyuubi.spark.connector.hive.{HiveConnectorUtils, KyuubiHiveConnectorException}
case class HiveScan(
sparkSession: SparkSession,
@@ -52,10 +51,20 @@ case class HiveScan(
private val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis
- private val partFileToHivePartMap: mutable.Map[PartitionedFile, HivePartition] = mutable.Map()
+ private val partFileToHivePartMap: mutable.Map[PartitionedFile, CatalogTablePartition] =
+ mutable.Map()
+
+ override def isSplitable(path: Path): Boolean = {
+ catalogTable.provider.map(_.toUpperCase(Locale.ROOT)).exists {
+ case "PARQUET" => true
+ case "ORC" => true
+ case "HIVE" => isHiveOrcOrParquet(catalogTable.storage)
+ case _ => super.isSplitable(path)
+ }
+ }
override def createReaderFactory(): PartitionReaderFactory = {
- val hiveConf = sparkSession.sessionState.newHadoopConf()
+ val hiveConf = fileIndex.hiveCatalog.hadoopConfiguration()
addCatalogTableConfToConf(hiveConf, catalogTable)
val table = HiveClientImpl.toHiveTable(catalogTable)
@@ -88,7 +97,7 @@ case class HiveScan(
}
lazy val partitionValueProject =
GenerateUnsafeProjection.generate(readPartitionAttributes, partitionAttributes)
- val splitFiles = selectedPartitions.flatMap { partition =>
+ val splitFiles: Seq[PartitionedFile] = selectedPartitions.flatMap { partition =>
val partitionValues =
if (readPartitionAttributes != partitionAttributes) {
partitionValueProject(partition.values).copy()
@@ -115,7 +124,7 @@ case class HiveScan(
}
if (splitFiles.length == 1) {
- val path = new Path(splitFiles(0).filePath)
+ val path = new Path(HiveConnectorUtils.partitionedFilePath(splitFiles(0)))
if (!isSplitable(path) && splitFiles(0).length >
sparkSession.sparkContext.getConf.getOption("spark.io.warning.largeFileThreshold")
.getOrElse("1024000000").toLong) {
@@ -142,6 +151,11 @@ case class HiveScan(
}
}
+ private def isHiveOrcOrParquet(storage: CatalogStorageFormat): Boolean = {
+ val serde = storage.serde.getOrElse("").toLowerCase(Locale.ROOT)
+ serde.contains("parquet") || serde.contains("orc")
+ }
+
def toAttributes(structType: StructType): Seq[AttributeReference] =
structType.map(f => AttributeReference(f.name, f.dataType, f.nullable, f.metadata)())
}
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/FilePartitionReader.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/SparkFilePartitionReader.scala
similarity index 92%
rename from extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/FilePartitionReader.scala
rename to extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/SparkFilePartitionReader.scala
index d0cd680d479..f785694d125 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/FilePartitionReader.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/SparkFilePartitionReader.scala
@@ -26,12 +26,14 @@ import org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupporte
import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.InputFileBlockHolder
import org.apache.spark.sql.internal.SQLConf
+import org.apache.kyuubi.spark.connector.hive.HiveConnectorUtils
+
// scalastyle:off line.size.limit
// copy from https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FilePartitionReader.scala
// scalastyle:on line.size.limit
-class FilePartitionReader[T](readers: Iterator[HivePartitionedFileReader[T]])
+class SparkFilePartitionReader[T](readers: Iterator[HivePartitionedFileReader[T]])
extends PartitionReader[T] with Logging {
- private var currentReader: HivePartitionedFileReader[T] = null
+ private var currentReader: HivePartitionedFileReader[T] = _
private val sqlConf = SQLConf.get
private def ignoreMissingFiles = sqlConf.ignoreMissingFiles
@@ -98,7 +100,10 @@ class FilePartitionReader[T](readers: Iterator[HivePartitionedFileReader[T]])
logInfo(s"Reading file $reader")
// Sets InputFileBlockHolder for the file block's information
val file = reader.file
- InputFileBlockHolder.set(file.filePath, file.start, file.length)
+ InputFileBlockHolder.set(
+ HiveConnectorUtils.partitionedFilePath(file),
+ file.start,
+ file.length)
reader
}
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveBatchWrite.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveBatchWrite.scala
index 625d79d0c7e..d12fc0efcc0 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveBatchWrite.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveBatchWrite.scala
@@ -28,13 +28,12 @@ import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.catalog._
import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
import org.apache.spark.sql.connector.write.{BatchWrite, DataWriterFactory, PhysicalWriteInfo, WriterCommitMessage}
-import org.apache.spark.sql.execution.command.CommandUtils
import org.apache.spark.sql.execution.datasources.{WriteJobDescription, WriteTaskResult}
import org.apache.spark.sql.execution.datasources.v2.FileBatchWrite
import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.{hive, toSQLValue, HiveExternalCatalog}
import org.apache.spark.sql.types.StringType
-import org.apache.kyuubi.spark.connector.hive.{HiveTableCatalog, KyuubiHiveConnectorException}
+import org.apache.kyuubi.spark.connector.hive.{HiveConnectorUtils, HiveTableCatalog, KyuubiHiveConnectorException}
import org.apache.kyuubi.spark.connector.hive.write.HiveWriteHelper.getPartitionSpec
class HiveBatchWrite(
@@ -77,7 +76,8 @@ class HiveBatchWrite(
val catalog = hiveTableCatalog.catalog
if (sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
val newTable = catalog.getTableMetadata(table.identifier)
- val newSize = CommandUtils.calculateTotalSize(sparkSession, newTable)
+ val (newSize, _) =
+ HiveConnectorUtils.calculateTotalSize(sparkSession, newTable, hiveTableCatalog)
val newStats = CatalogStatistics(sizeInBytes = newSize)
catalog.alterTableStats(table.identifier, Some(newStats))
} else if (table.stats.nonEmpty) {
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveWrite.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveWrite.scala
index 62db1fa0afb..2ee3386738f 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveWrite.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveWrite.scala
@@ -76,7 +76,7 @@ case class HiveWrite(
override def description(): String = "Kyuubi-Hive-Connector"
override def toBatch: BatchWrite = {
- val tmpLocation = HiveWriteHelper.getExternalTmpPath(sparkSession, hadoopConf, tableLocation)
+ val tmpLocation = HiveWriteHelper.getExternalTmpPath(externalCatalog, hadoopConf, tableLocation)
val fileSinkConf = new FileSinkDesc(tmpLocation.toString, tableDesc, false)
handleCompression(fileSinkConf, hadoopConf)
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveWriteHelper.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveWriteHelper.scala
index 68ba0bfb223..25bca911fff 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveWriteHelper.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/HiveWriteHelper.scala
@@ -27,8 +27,8 @@ import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.hive.common.FileUtils
import org.apache.hadoop.hive.ql.exec.TaskRunner
import org.apache.spark.internal.Logging
-import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils
+import org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener
import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.{hive, HiveExternalCatalog, HiveVersion}
import org.apache.kyuubi.spark.connector.hive.KyuubiHiveConnectorException
@@ -47,7 +47,7 @@ object HiveWriteHelper extends Logging {
private val hiveScratchDir = "hive.exec.scratchdir"
def getExternalTmpPath(
- sparkSession: SparkSession,
+ externalCatalog: ExternalCatalogWithListener,
hadoopConf: Configuration,
path: Path): Path = {
@@ -70,7 +70,6 @@ object HiveWriteHelper extends Logging {
assert(hiveVersionsUsingNewExternalTempPath ++ hiveVersionsUsingOldExternalTempPath ==
allSupportedHiveVersions)
- val externalCatalog = sparkSession.sharedState.externalCatalog
val hiveVersion = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client.version
val stagingDir = hadoopConf.get(hiveStagingDir, ".hive-staging")
val scratchDir = hadoopConf.get(hiveScratchDir, "/tmp/hive")
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/spark/sql/hive/kyuubi/connector/HiveBridgeHelper.scala b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/spark/sql/hive/kyuubi/connector/HiveBridgeHelper.scala
index 349edd327e1..8e3a9cd3dae 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/spark/sql/hive/kyuubi/connector/HiveBridgeHelper.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/spark/sql/hive/kyuubi/connector/HiveBridgeHelper.scala
@@ -47,6 +47,7 @@ object HiveBridgeHelper {
val HadoopTableReader = org.apache.spark.sql.hive.HadoopTableReader
val SparkHadoopUtil = org.apache.spark.deploy.SparkHadoopUtil
val Utils = org.apache.spark.util.Utils
+ val CatalogV2Implicits = org.apache.spark.sql.connector.catalog.CatalogV2Implicits
def postExternalCatalogEvent(sc: SparkContext, event: ExternalCatalogEvent): Unit = {
sc.listenerBus.post(event)
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/ExternalCatalogPoolSuite.scala b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/ExternalCatalogPoolSuite.scala
index 7c02e8531a8..937e32d6d2a 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/ExternalCatalogPoolSuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/ExternalCatalogPoolSuite.scala
@@ -56,11 +56,11 @@ class ExternalCatalogPoolSuite extends KyuubiHiveTest {
val externalCatalog2 = pool.take(catalog2)
assert(externalCatalog1 != externalCatalog2)
- (1 to 10).foreach { id =>
+ (1 to 10).foreach { _ =>
assert(pool.take(catalog1) == externalCatalog1)
}
- (1 to 10).foreach { id =>
+ (1 to 10).foreach { _ =>
assert(pool.take(catalog2) == externalCatalog2)
}
}
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/HiveCatalogSuite.scala b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/HiveCatalogSuite.scala
index 7a1eb86dc77..f43dafd1163 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/HiveCatalogSuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/HiveCatalogSuite.scala
@@ -27,14 +27,16 @@ import scala.util.Try
import com.google.common.collect.Maps
import org.apache.hadoop.fs.Path
import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.analysis.{NoSuchNamespaceException, NoSuchTableException, TableAlreadyExistsException}
+import org.apache.spark.sql.catalyst.analysis.{NoSuchNamespaceException, NoSuchTableException, TableAlreadyExistsException, UnresolvedRelation}
import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
import org.apache.spark.sql.connector.catalog.{Identifier, TableCatalog}
+import org.apache.spark.sql.connector.expressions.Transform
import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper._
import org.apache.spark.sql.types.{IntegerType, StringType, StructType}
import org.apache.spark.sql.util.CaseInsensitiveStringMap
import org.apache.kyuubi.spark.connector.hive.HiveTableCatalog.IdentifierHelper
+import org.apache.kyuubi.spark.connector.hive.read.HiveScan
class HiveCatalogSuite extends KyuubiHiveTest {
@@ -95,7 +97,7 @@ class HiveCatalogSuite extends KyuubiHiveTest {
}
test("get catalog name") {
- withSparkSession() { spark =>
+ withSparkSession() { _ =>
val catalog = new HiveTableCatalog
val catalogName = "hive"
catalog.initialize(catalogName, CaseInsensitiveStringMap.empty())
@@ -119,7 +121,9 @@ class HiveCatalogSuite extends KyuubiHiveTest {
val exception = intercept[AnalysisException] {
spark.table("hive.ns1.nonexistent_table")
}
- assert(exception.message === "Table or view not found: hive.ns1.nonexistent_table")
+ assert(exception.plan.exists { p =>
+ p.exists(child => child.isInstanceOf[UnresolvedRelation])
+ })
}
}
@@ -131,13 +135,13 @@ class HiveCatalogSuite extends KyuubiHiveTest {
assert(catalog.listTables(Array("ns")).isEmpty)
- catalog.createTable(ident1, schema, Array.empty, emptyProps)
+ catalog.createTable(ident1, schema, Array.empty[Transform], emptyProps)
assert(catalog.listTables(Array("ns")).toSet == Set(ident1))
assert(catalog.listTables(Array("ns2")).isEmpty)
- catalog.createTable(ident3, schema, Array.empty, emptyProps)
- catalog.createTable(ident2, schema, Array.empty, emptyProps)
+ catalog.createTable(ident3, schema, Array.empty[Transform], emptyProps)
+ catalog.createTable(ident2, schema, Array.empty[Transform], emptyProps)
assert(catalog.listTables(Array("ns")).toSet == Set(ident1, ident2))
assert(catalog.listTables(Array("ns2")).toSet == Set(ident3))
@@ -157,7 +161,8 @@ class HiveCatalogSuite extends KyuubiHiveTest {
test("createTable") {
assert(!catalog.tableExists(testIdent))
- val table = catalog.createTable(testIdent, schema, Array.empty, emptyProps)
+ val table =
+ catalog.createTable(testIdent, schema, Array.empty[Transform], emptyProps)
val parsed = CatalystSqlParser.parseMultipartIdentifier(table.name)
assert(parsed == Seq("db", "test_table"))
@@ -174,7 +179,7 @@ class HiveCatalogSuite extends KyuubiHiveTest {
assert(!catalog.tableExists(testIdent))
- val table = catalog.createTable(testIdent, schema, Array.empty, properties)
+ val table = catalog.createTable(testIdent, schema, Array.empty[Transform], properties)
val parsed = CatalystSqlParser.parseMultipartIdentifier(table.name)
assert(parsed == Seq("db", "test_table"))
@@ -188,13 +193,13 @@ class HiveCatalogSuite extends KyuubiHiveTest {
test("createTable: table already exists") {
assert(!catalog.tableExists(testIdent))
- val table = catalog.createTable(testIdent, schema, Array.empty, emptyProps)
+ val table = catalog.createTable(testIdent, schema, Array.empty[Transform], emptyProps)
val exc = intercept[TableAlreadyExistsException] {
- catalog.createTable(testIdent, schema, Array.empty, emptyProps)
+ catalog.createTable(testIdent, schema, Array.empty[Transform], emptyProps)
}
- assert(exc.message.contains(table.name()))
+ assert(exc.message.contains(testIdent.name()))
assert(exc.message.contains("already exists"))
assert(catalog.tableExists(testIdent))
@@ -204,7 +209,7 @@ class HiveCatalogSuite extends KyuubiHiveTest {
test("tableExists") {
assert(!catalog.tableExists(testIdent))
- catalog.createTable(testIdent, schema, Array.empty, emptyProps)
+ catalog.createTable(testIdent, schema, Array.empty[Transform], emptyProps)
assert(catalog.tableExists(testIdent))
@@ -215,35 +220,52 @@ class HiveCatalogSuite extends KyuubiHiveTest {
test("createTable: location") {
val properties = new util.HashMap[String, String]()
+ properties.put(TableCatalog.PROP_PROVIDER, "parquet")
assert(!catalog.tableExists(testIdent))
// default location
- val t1 = catalog.createTable(testIdent, schema, Array.empty, properties).asInstanceOf[HiveTable]
+ val t1 = catalog.createTable(
+ testIdent,
+ schema,
+ Array.empty[Transform],
+ properties).asInstanceOf[HiveTable]
assert(t1.catalogTable.location ===
catalog.catalog.defaultTablePath(testIdent.asTableIdentifier))
catalog.dropTable(testIdent)
// relative path
properties.put(TableCatalog.PROP_LOCATION, "relative/path")
- val t2 = catalog.createTable(testIdent, schema, Array.empty, properties).asInstanceOf[HiveTable]
+ val t2 = catalog.createTable(
+ testIdent,
+ schema,
+ Array.empty[Transform],
+ properties).asInstanceOf[HiveTable]
assert(t2.catalogTable.location === makeQualifiedPathWithWarehouse("db.db/relative/path"))
catalog.dropTable(testIdent)
// absolute path without scheme
properties.put(TableCatalog.PROP_LOCATION, "/absolute/path")
- val t3 = catalog.createTable(testIdent, schema, Array.empty, properties).asInstanceOf[HiveTable]
+ val t3 = catalog.createTable(
+ testIdent,
+ schema,
+ Array.empty[Transform],
+ properties).asInstanceOf[HiveTable]
assert(t3.catalogTable.location.toString === "file:/absolute/path")
catalog.dropTable(testIdent)
// absolute path with scheme
properties.put(TableCatalog.PROP_LOCATION, "file:/absolute/path")
- val t4 = catalog.createTable(testIdent, schema, Array.empty, properties).asInstanceOf[HiveTable]
+ val t4 = catalog.createTable(
+ testIdent,
+ schema,
+ Array.empty[Transform],
+ properties).asInstanceOf[HiveTable]
assert(t4.catalogTable.location.toString === "file:/absolute/path")
catalog.dropTable(testIdent)
}
test("loadTable") {
- val table = catalog.createTable(testIdent, schema, Array.empty, emptyProps)
+ val table = catalog.createTable(testIdent, schema, Array.empty[Transform], emptyProps)
val loaded = catalog.loadTable(testIdent)
assert(table.name == loaded.name)
@@ -253,15 +275,13 @@ class HiveCatalogSuite extends KyuubiHiveTest {
}
test("loadTable: table does not exist") {
- val exc = intercept[NoSuchTableException] {
+ intercept[NoSuchTableException] {
catalog.loadTable(testIdent)
}
-
- assert(exc.message.contains("Table or view 'test_table' not found in database 'db'"))
}
test("invalidateTable") {
- val table = catalog.createTable(testIdent, schema, Array.empty, emptyProps)
+ val table = catalog.createTable(testIdent, schema, Array.empty[Transform], emptyProps)
// Hive v2 don't cache table
catalog.invalidateTable(testIdent)
@@ -321,4 +341,22 @@ class HiveCatalogSuite extends KyuubiHiveTest {
catalog.dropNamespace(testNs, cascade = false)
}
+
+ test("Support Parquet/Orc provider is splitable") {
+ val parquet_table = Identifier.of(testNs, "parquet_table")
+ val parProps: util.Map[String, String] = new util.HashMap[String, String]()
+ parProps.put(TableCatalog.PROP_PROVIDER, "parquet")
+ val pt = catalog.createTable(parquet_table, schema, Array.empty[Transform], parProps)
+ val parScan = pt.asInstanceOf[HiveTable]
+ .newScanBuilder(CaseInsensitiveStringMap.empty()).build().asInstanceOf[HiveScan]
+ assert(parScan.isSplitable(new Path("empty")))
+
+ val orc_table = Identifier.of(testNs, "orc_table")
+ val orcProps: util.Map[String, String] = new util.HashMap[String, String]()
+ orcProps.put(TableCatalog.PROP_PROVIDER, "orc")
+ val ot = catalog.createTable(orc_table, schema, Array.empty[Transform], orcProps)
+ val orcScan = ot.asInstanceOf[HiveTable]
+ .newScanBuilder(CaseInsensitiveStringMap.empty()).build().asInstanceOf[HiveScan]
+ assert(orcScan.isSplitable(new Path("empty")))
+ }
}
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/HiveQuerySuite.scala b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/HiveQuerySuite.scala
index 16ea032348b..1d3d5ae10aa 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/HiveQuerySuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/HiveQuerySuite.scala
@@ -18,26 +18,35 @@
package org.apache.kyuubi.spark.connector.hive
import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
class HiveQuerySuite extends KyuubiHiveTest {
- def withTempNonPartitionedTable(spark: SparkSession, table: String)(f: => Unit): Unit = {
+ def withTempNonPartitionedTable(
+ spark: SparkSession,
+ table: String,
+ format: String = "PARQUET",
+ hiveTable: Boolean = false)(f: => Unit): Unit = {
spark.sql(
s"""
| CREATE TABLE IF NOT EXISTS
| $table (id String, date String)
- | USING PARQUET
+ | ${if (hiveTable) "STORED AS" else "USING"} $format
|""".stripMargin).collect()
try f
finally spark.sql(s"DROP TABLE $table")
}
- def withTempPartitionedTable(spark: SparkSession, table: String)(f: => Unit): Unit = {
+ def withTempPartitionedTable(
+ spark: SparkSession,
+ table: String,
+ format: String = "PARQUET",
+ hiveTable: Boolean = false)(f: => Unit): Unit = {
spark.sql(
s"""
| CREATE TABLE IF NOT EXISTS
| $table (id String, year String, month string)
- | USING PARQUET
+ | ${if (hiveTable) "STORED AS" else "USING"} $format
| PARTITIONED BY (year, month)
|""".stripMargin).collect()
try f
@@ -70,7 +79,10 @@ class HiveQuerySuite extends KyuubiHiveTest {
| SELECT * FROM hive.ns1.tb1
|""".stripMargin)
}
- assert(e.getMessage().contains("Table or view not found: hive.ns1.tb1"))
+
+ assert(e.plan.exists { p =>
+ p.exists(child => child.isInstanceOf[UnresolvedRelation])
+ })
}
}
}
@@ -182,4 +194,75 @@ class HiveQuerySuite extends KyuubiHiveTest {
}
}
}
+
+ test("read partitioned avro table") {
+ readPartitionedTable("AVRO", true)
+ readPartitionedTable("AVRO", false)
+ }
+
+ test("read un-partitioned avro table") {
+ readUnPartitionedTable("AVRO", true)
+ readUnPartitionedTable("AVRO", false)
+ }
+
+ test("read partitioned textfile table") {
+ readPartitionedTable("TEXTFILE", true)
+ readPartitionedTable("TEXTFILE", false)
+ }
+
+ test("read un-partitioned textfile table") {
+ readUnPartitionedTable("TEXTFILE", true)
+ readUnPartitionedTable("TEXTFILE", false)
+ }
+
+ test("read partitioned SequenceFile table") {
+ readPartitionedTable("SequenceFile", true)
+ readPartitionedTable("SequenceFile", false)
+ }
+
+ test("read un-partitioned SequenceFile table") {
+ readUnPartitionedTable("SequenceFile", true)
+ readUnPartitionedTable("SequenceFile", false)
+ }
+
+ test("read partitioned ORC table") {
+ readPartitionedTable("ORC", true)
+ readPartitionedTable("ORC", false)
+ }
+
+ test("read un-partitioned ORC table") {
+ readUnPartitionedTable("ORC", true)
+ readUnPartitionedTable("ORC", false)
+ }
+
+ private def readPartitionedTable(format: String, hiveTable: Boolean): Unit = {
+ withSparkSession() { spark =>
+ val table = "hive.default.employee"
+ withTempPartitionedTable(spark, table, format, hiveTable) {
+ spark.sql(
+ s"""
+ | INSERT OVERWRITE
+ | $table PARTITION(year = '2023')
+ | VALUES("zhao", "09")
+ |""".stripMargin)
+ checkQueryResult(s"select * from $table", spark, Array(Row.apply("zhao", "2023", "09")))
+ }
+ }
+ }
+
+ private def readUnPartitionedTable(format: String, hiveTable: Boolean): Unit = {
+ withSparkSession() { spark =>
+ val table = "hive.default.employee"
+ withTempNonPartitionedTable(spark, table, format, hiveTable) {
+ spark.sql(
+ s"""
+ | INSERT OVERWRITE
+ | $table
+ | VALUES("zhao", "2023-09-21")
+ |""".stripMargin).collect()
+
+ checkQueryResult(s"select * from $table", spark, Array(Row.apply("zhao", "2023-09-21")))
+ }
+ }
+ }
}
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/KyuubiHiveTest.scala b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/KyuubiHiveTest.scala
index d0b17dc0544..851659b15e9 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/KyuubiHiveTest.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/KyuubiHiveTest.scala
@@ -35,7 +35,8 @@ abstract class KyuubiHiveTest extends QueryTest with Logging {
TableCatalog.PROP_PROVIDER,
TableCatalog.PROP_OWNER,
TableCatalog.PROP_EXTERNAL,
- TableCatalog.PROP_IS_MANAGED_LOCATION)
+ TableCatalog.PROP_IS_MANAGED_LOCATION,
+ "transient_lastDdlTime")
protected val NAMESPACE_RESERVED_PROPERTIES =
Seq(
@@ -43,7 +44,7 @@ abstract class KyuubiHiveTest extends QueryTest with Logging {
SupportsNamespaces.PROP_LOCATION,
SupportsNamespaces.PROP_OWNER)
- protected def catalogName: String = "hive"
+ protected val catalogName: String = "hive"
override def beforeEach(): Unit = {
super.beforeAll()
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/CreateNamespaceSuite.scala b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/CreateNamespaceSuite.scala
index 855eb0c674b..d6b90cc0419 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/CreateNamespaceSuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/CreateNamespaceSuite.scala
@@ -62,7 +62,8 @@ trait CreateNamespaceSuiteBase extends DDLCommandTestUtils {
val e = intercept[IllegalArgumentException] {
sql(s"CREATE NAMESPACE $ns LOCATION ''")
}
- assert(e.getMessage.contains("Can not create a Path from an empty string"))
+ assert(e.getMessage.contains("Can not create a Path from an empty string") ||
+ e.getMessage.contains("The location name cannot be empty string"))
val uri = new Path(path).toUri
sql(s"CREATE NAMESPACE $ns LOCATION '$uri'")
@@ -83,7 +84,8 @@ trait CreateNamespaceSuiteBase extends DDLCommandTestUtils {
val e = intercept[NamespaceAlreadyExistsException] {
sql(s"CREATE NAMESPACE $ns")
}
- assert(e.getMessage.contains(s"Namespace '$namespace' already exists"))
+ assert(e.getMessage.contains(s"Namespace '$namespace' already exists") ||
+ e.getMessage.contains(s"Cannot create schema `fakens` because it already exists"))
// The following will be no-op since the namespace already exists.
Try { sql(s"CREATE NAMESPACE IF NOT EXISTS $ns") }.isSuccess
@@ -131,8 +133,6 @@ trait CreateNamespaceSuiteBase extends DDLCommandTestUtils {
class CreateNamespaceV2Suite extends CreateNamespaceSuiteBase {
- override protected def catalogName: String = super.catalogName
-
override protected def catalogVersion: String = "Hive V2"
override protected def commandVersion: String = V2_COMMAND_VERSION
@@ -142,7 +142,7 @@ class CreateNamespaceV1Suite extends CreateNamespaceSuiteBase {
val SESSION_CATALOG_NAME: String = "spark_catalog"
- override protected def catalogName: String = SESSION_CATALOG_NAME
+ override protected val catalogName: String = SESSION_CATALOG_NAME
override protected def catalogVersion: String = "V1"
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/CreateTableSuite.scala b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/CreateTableSuite.scala
new file mode 100644
index 00000000000..d26ec420980
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/CreateTableSuite.scala
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kyuubi.spark.connector.hive.command
+
+import org.apache.spark.sql.connector.catalog.Identifier
+
+import org.apache.kyuubi.spark.connector.hive.{HiveTable, HiveTableCatalog}
+import org.apache.kyuubi.spark.connector.hive.command.DDLCommandTestUtils.V2_COMMAND_VERSION
+
+class CreateTableSuite extends DDLCommandTestUtils {
+
+ override protected def command: String = "CREATE TABLE"
+
+ override protected def catalogVersion: String = "Hive V2"
+
+ override protected def commandVersion: String = V2_COMMAND_VERSION
+
+ test("Create datasource table") {
+ val hiveCatalog = spark.sessionState.catalogManager
+ .catalog(catalogName).asInstanceOf[HiveTableCatalog]
+ val table = "hive.default.employee"
+ Seq("parquet", "orc").foreach { provider =>
+ withTable(table) {
+ sql(
+ s"""
+ | CREATE TABLE IF NOT EXISTS
+ | $table (id String, year String, month string)
+ | USING $provider
+ | PARTITIONED BY (year, month)
+ |""".stripMargin).collect()
+ val employee = Identifier.of(Array("default"), "employee")
+ val loadTable = hiveCatalog.loadTable(employee)
+ assert(loadTable.isInstanceOf[HiveTable])
+ val catalogTable = loadTable.asInstanceOf[HiveTable].catalogTable
+ assert(catalogTable.provider.isDefined)
+ assert(catalogTable.provider.get.equalsIgnoreCase(provider))
+ }
+ }
+ }
+
+ test("Create hive table") {
+ val hiveCatalog = spark.sessionState.catalogManager
+ .catalog(catalogName).asInstanceOf[HiveTableCatalog]
+ val table = "hive.default.employee"
+ Seq("parquet", "orc").foreach { provider =>
+ withTable(table) {
+ sql(
+ s"""
+ | CREATE TABLE IF NOT EXISTS
+ | $table (id String, year String, month string)
+ | STORED AS $provider
+ | PARTITIONED BY (year, month)
+ |""".stripMargin).collect()
+ val employee = Identifier.of(Array("default"), "employee")
+ val loadTable = hiveCatalog.loadTable(employee)
+ assert(loadTable.isInstanceOf[HiveTable])
+ val catalogTable = loadTable.asInstanceOf[HiveTable].catalogTable
+ assert(catalogTable.provider.isDefined)
+ assert(catalogTable.provider.get.equalsIgnoreCase("hive"))
+ assert(catalogTable.storage.serde.getOrElse("Unknown").contains(provider))
+ }
+ }
+ }
+}
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/DropNamespaceSuite.scala b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/DropNamespaceSuite.scala
index 66eb42c86ad..eebfbe48812 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/DropNamespaceSuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/DropNamespaceSuite.scala
@@ -20,7 +20,9 @@ package org.apache.kyuubi.spark.connector.hive.command
import org.apache.spark.sql.{AnalysisException, Row}
import org.apache.spark.sql.types.{StringType, StructType}
+import org.apache.kyuubi.spark.connector.common.SparkUtils.SPARK_RUNTIME_VERSION
import org.apache.kyuubi.spark.connector.hive.command.DDLCommandTestUtils.{V1_COMMAND_VERSION, V2_COMMAND_VERSION}
+import org.apache.kyuubi.util.AssertionUtils.interceptContains
trait DropNamespaceSuiteBase extends DDLCommandTestUtils {
override protected def command: String = "DROP NAMESPACE"
@@ -60,7 +62,8 @@ trait DropNamespaceSuiteBase extends DDLCommandTestUtils {
val message = intercept[AnalysisException] {
sql(s"DROP NAMESPACE $catalogName.unknown")
}.getMessage
- assert(message.contains(s"'unknown' not found"))
+ assert(message.contains(s"'unknown' not found") ||
+ message.contains(s"The schema `unknown` cannot be found"))
}
test("drop non-empty namespace with a non-cascading mode") {
@@ -69,10 +72,14 @@ trait DropNamespaceSuiteBase extends DDLCommandTestUtils {
checkNamespace(Seq(namespace) ++ builtinNamespace)
// $catalog.ns.table is present, thus $catalog.ns cannot be dropped.
- val e = intercept[IllegalStateException] {
+ interceptContains[AnalysisException] {
sql(s"DROP NAMESPACE $catalogName.$namespace")
- }
- assert(e.getMessage.contains(s"Namespace $namespace is not empty"))
+ }(if (SPARK_RUNTIME_VERSION >= "3.4") {
+ s"[SCHEMA_NOT_EMPTY] Cannot drop a schema `$namespace` because it contains objects"
+ } else {
+ "Use CASCADE option to drop a non-empty database"
+ })
+
sql(s"DROP TABLE $catalogName.$namespace.table")
// Now that $catalog.ns is empty, it can be dropped.
@@ -99,8 +106,6 @@ trait DropNamespaceSuiteBase extends DDLCommandTestUtils {
class DropNamespaceV2Suite extends DropNamespaceSuiteBase {
- override protected def catalogName: String = super.catalogName
-
override protected def catalogVersion: String = "Hive V2"
override protected def commandVersion: String = V2_COMMAND_VERSION
@@ -110,7 +115,7 @@ class DropNamespaceV1Suite extends DropNamespaceSuiteBase {
val SESSION_CATALOG_NAME: String = "spark_catalog"
- override protected def catalogName: String = SESSION_CATALOG_NAME
+ override protected val catalogName: String = SESSION_CATALOG_NAME
override protected def catalogVersion: String = "V1"
diff --git a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/ShowTablesSuite.scala b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/ShowTablesSuite.scala
index bff47c9de56..445ca9fa7a5 100644
--- a/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/ShowTablesSuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-hive/src/test/scala/org/apache/kyuubi/spark/connector/hive/command/ShowTablesSuite.scala
@@ -96,8 +96,6 @@ trait ShowTablesSuiteBase extends DDLCommandTestUtils {
class ShowTablesV2Suite extends ShowTablesSuiteBase {
- override protected def catalogName: String = super.catalogName
-
override protected def catalogVersion: String = "Hive V2"
override protected def commandVersion: String = V2_COMMAND_VERSION
@@ -107,7 +105,7 @@ class ShowTablesV1Suite extends ShowTablesSuiteBase {
val SESSION_CATALOG_NAME: String = "spark_catalog"
- override protected def catalogName: String = SESSION_CATALOG_NAME
+ override protected val catalogName: String = SESSION_CATALOG_NAME
override protected def catalogVersion: String = "V1"
diff --git a/extensions/spark/kyuubi-spark-connector-kudu/src/test/resources/kudu-compose.yml b/extensions/spark/kyuubi-spark-connector-kudu/src/test/resources/kudu-compose.yml
deleted file mode 100644
index 149cd5d47ac..00000000000
--- a/extensions/spark/kyuubi-spark-connector-kudu/src/test/resources/kudu-compose.yml
+++ /dev/null
@@ -1,64 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-version: "3"
-services:
- kudu-master:
- image: apache/kudu:1.15.0
- hostname: kudu-master
- ports:
- - "7051"
- - "8051"
- command: ["master"]
- environment:
- - KUDU_MASTERS=kudu-master
-
- kudu-tserver-1:
- image: apache/kudu:1.15.0
- depends_on:
- - kudu-master
- hostname: kudu-tserver-1
- ports:
- - "7050"
- - "8050"
- command: ["tserver"]
- environment:
- - KUDU_MASTERS=kudu-master
-
- kudu-tserver-2:
- image: apache/kudu:1.15.0
- depends_on:
- - kudu-master
- hostname: kudu-tserver-2
- ports:
- - "7050"
- - "8050"
- command: [ "tserver" ]
- environment:
- - KUDU_MASTERS=kudu-master
-
- kudu-tserver-3:
- image: apache/kudu:1.15.0
- depends_on:
- - kudu-master
- hostname: kudu-tserver-3
- ports:
- - "7050"
- - "8050"
- command: [ "tserver" ]
- environment:
- - KUDU_MASTERS=kudu-master
diff --git a/extensions/spark/kyuubi-spark-connector-kudu/src/test/scala/org/apache/kyuubi/spark/connector/kudu/KuduMixin.scala b/extensions/spark/kyuubi-spark-connector-kudu/src/test/scala/org/apache/kyuubi/spark/connector/kudu/KuduMixin.scala
deleted file mode 100644
index dee09db387a..00000000000
--- a/extensions/spark/kyuubi-spark-connector-kudu/src/test/scala/org/apache/kyuubi/spark/connector/kudu/KuduMixin.scala
+++ /dev/null
@@ -1,41 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.spark.connector.kudu
-
-import java.io.File
-
-import com.dimafeng.testcontainers.{DockerComposeContainer, ExposedService, ForAllTestContainer}
-
-import org.apache.kyuubi.{KyuubiFunSuite, Utils}
-
-trait KuduMixin extends KyuubiFunSuite with ForAllTestContainer {
-
- private val KUDU_MASTER_PORT = 7051
-
- override val container: DockerComposeContainer =
- DockerComposeContainer
- .Def(
- composeFiles =
- new File(Utils.getContextOrKyuubiClassLoader.getResource("kudu-compose.yml").toURI),
- exposedServices = ExposedService("kudu-master", KUDU_MASTER_PORT) :: Nil)
- .createContainer()
-
- def kuduMasterHost: String = container.getServiceHost("kudu-master", KUDU_MASTER_PORT)
- def kuduMasterPort: Int = container.getServicePort("kudu-master", KUDU_MASTER_PORT)
- def kuduMasterUrl: String = s"$kuduMasterHost:$kuduMasterPort"
-}
diff --git a/extensions/spark/kyuubi-spark-connector-tpcds/pom.xml b/extensions/spark/kyuubi-spark-connector-tpcds/pom.xml
index e9b86773973..5999b8c6304 100644
--- a/extensions/spark/kyuubi-spark-connector-tpcds/pom.xml
+++ b/extensions/spark/kyuubi-spark-connector-tpcds/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-spark-connector-tpcds_2.12
+ kyuubi-spark-connector-tpcds_${scala.binary.version}jarKyuubi Spark TPC-DS Connectorhttps://kyuubi.apache.org/
@@ -173,7 +173,7 @@
io.trino.tpcds:tpcdscom.google.guava:guava
- org.apache.kyuubi:kyuubi-spark-connector-common_${scala.binary.version}
+ org.apache.kyuubi:*
diff --git a/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSCatalogSuite.scala b/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSCatalogSuite.scala
index 8a37d95e854..f5c6563e770 100644
--- a/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSCatalogSuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSCatalogSuite.scala
@@ -23,7 +23,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap
import org.apache.kyuubi.KyuubiFunSuite
import org.apache.kyuubi.spark.connector.common.LocalSparkSession.withSparkSession
-import org.apache.kyuubi.spark.connector.common.SparkUtils
+import org.apache.kyuubi.spark.connector.common.SparkUtils.SPARK_RUNTIME_VERSION
class TPCDSCatalogSuite extends KyuubiFunSuite {
@@ -35,7 +35,7 @@ class TPCDSCatalogSuite extends KyuubiFunSuite {
.set("spark.sql.catalog.tpcds", classOf[TPCDSCatalog].getName)
.set("spark.sql.cbo.enabled", "true")
.set("spark.sql.cbo.planStats.enabled", "true")
- withSparkSession(SparkSession.builder.config(sparkConf).getOrCreate()) { spark =>
+ withSparkSession(SparkSession.builder.config(sparkConf).getOrCreate()) { _ =>
val catalog = new TPCDSCatalog
val catalogName = "test"
catalog.initialize(catalogName, CaseInsensitiveStringMap.empty())
@@ -126,7 +126,7 @@ class TPCDSCatalogSuite extends KyuubiFunSuite {
val stats = spark.table(tableName).queryExecution.analyzed.stats
assert(stats.sizeInBytes == sizeInBytes)
// stats.rowCount only has value after SPARK-33954
- if (SparkUtils.isSparkVersionAtLeast("3.2")) {
+ if (SPARK_RUNTIME_VERSION >= "3.2") {
assert(stats.rowCount.contains(rowCount), tableName)
}
}
@@ -170,7 +170,8 @@ class TPCDSCatalogSuite extends KyuubiFunSuite {
val exception = intercept[AnalysisException] {
spark.table("tpcds.sf1.nonexistent_table")
}
- assert(exception.message === "Table or view not found: tpcds.sf1.nonexistent_table")
+ assert(exception.message.contains("Table or view not found")
+ || exception.message.contains("TABLE_OR_VIEW_NOT_FOUND"))
}
}
}
diff --git a/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSQuerySuite.scala b/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSQuerySuite.scala
index 83679989a79..c99d7becafa 100644
--- a/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSQuerySuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSQuerySuite.scala
@@ -28,26 +28,17 @@ import org.apache.kyuubi.{KyuubiFunSuite, Utils}
import org.apache.kyuubi.spark.connector.common.GoldenFileUtils._
import org.apache.kyuubi.spark.connector.common.LocalSparkSession.withSparkSession
-// scalastyle:off line.size.limit
/**
* To run this test suite:
* {{{
- * build/mvn clean install \
- * -pl extensions/spark/kyuubi-spark-connector-tpcds -am \
- * -Dmaven.plugin.scalatest.exclude.tags="" \
- * -Dtest=none -DwildcardSuites=org.apache.kyuubi.spark.connector.tpcds.TPCDSQuerySuite
+ * KYUUBI_UPDATE=0 dev/gen/gen_tpcds_queries.sh
* }}}
*
* To re-generate golden files for this suite:
* {{{
- * KYUUBI_UPDATE=1 build/mvn clean install \
- * -pl extensions/spark/kyuubi-spark-connector-tpcds -am \
- * -Dmaven.plugin.scalatest.exclude.tags="" \
- * -Dtest=none -DwildcardSuites=org.apache.kyuubi.spark.connector.tpcds.TPCDSQuerySuite
+ * dev/gen/gen_tpcds_queries.sh
* }}}
*/
-// scalastyle:on line.size.limit
-
@Slow
class TPCDSQuerySuite extends KyuubiFunSuite {
diff --git a/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/spark/kyuubi/benchmark/KyuubiBenchmarkBase.scala b/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/spark/kyuubi/benchmark/KyuubiBenchmarkBase.scala
index bee515592da..e4399891845 100644
--- a/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/spark/kyuubi/benchmark/KyuubiBenchmarkBase.scala
+++ b/extensions/spark/kyuubi-spark-connector-tpcds/src/test/scala/org/apache/spark/kyuubi/benchmark/KyuubiBenchmarkBase.scala
@@ -22,6 +22,7 @@ import java.io.{File, FileOutputStream, OutputStream}
import scala.collection.JavaConverters._
import com.google.common.reflect.ClassPath
+import org.scalatest.Assertions._
trait KyuubiBenchmarkBase {
var output: Option[OutputStream] = None
diff --git a/extensions/spark/kyuubi-spark-connector-tpch/pom.xml b/extensions/spark/kyuubi-spark-connector-tpch/pom.xml
index 5b418e200e7..22a5405a6a0 100644
--- a/extensions/spark/kyuubi-spark-connector-tpch/pom.xml
+++ b/extensions/spark/kyuubi-spark-connector-tpch/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-spark-connector-tpch_2.12
+ kyuubi-spark-connector-tpch_${scala.binary.version}jarKyuubi Spark TPC-H Connectorhttps://kyuubi.apache.org/
@@ -172,7 +172,7 @@
io.trino.tpch:tpchcom.google.guava:guava
- org.apache.kyuubi:kyuubi-spark-connector-common_${scala.binary.version}
+ org.apache.kyuubi:*
diff --git a/extensions/spark/kyuubi-spark-connector-tpch/src/main/scala/org/apache/kyuubi/spark/connector/tpch/TPCHBatchScan.scala b/extensions/spark/kyuubi-spark-connector-tpch/src/main/scala/org/apache/kyuubi/spark/connector/tpch/TPCHBatchScan.scala
index b5bca42cc11..63ff82b7a3f 100644
--- a/extensions/spark/kyuubi-spark-connector-tpch/src/main/scala/org/apache/kyuubi/spark/connector/tpch/TPCHBatchScan.scala
+++ b/extensions/spark/kyuubi-spark-connector-tpch/src/main/scala/org/apache/kyuubi/spark/connector/tpch/TPCHBatchScan.scala
@@ -144,7 +144,7 @@ class TPCHPartitionReader(
case (value, dt) => throw new IllegalArgumentException(s"value: $value, type: $dt")
}
}
- InternalRow.fromSeq(rowAny)
+ InternalRow.fromSeq(rowAny.toSeq)
}
hasNext
}
diff --git a/extensions/spark/kyuubi-spark-connector-tpch/src/main/scala/org/apache/kyuubi/spark/connector/tpch/TPCHTable.scala b/extensions/spark/kyuubi-spark-connector-tpch/src/main/scala/org/apache/kyuubi/spark/connector/tpch/TPCHTable.scala
index de4bd49f220..65038d35bc1 100644
--- a/extensions/spark/kyuubi-spark-connector-tpch/src/main/scala/org/apache/kyuubi/spark/connector/tpch/TPCHTable.scala
+++ b/extensions/spark/kyuubi-spark-connector-tpch/src/main/scala/org/apache/kyuubi/spark/connector/tpch/TPCHTable.scala
@@ -44,7 +44,7 @@ class TPCHTable(tbl: String, scale: Double, tpchConf: TPCHConf)
StructType(
tpchTable.asInstanceOf[TpchTable[TpchEntity]].getColumns.zipWithIndex.map { case (c, _) =>
StructField(c.getColumnName, toSparkDataType(c.getType))
- })
+ }.toSeq)
}
override def capabilities(): util.Set[TableCapability] =
diff --git a/extensions/spark/kyuubi-spark-connector-tpch/src/test/scala/org/apache/kyuubi/spark/connector/tpch/TPCHCatalogSuite.scala b/extensions/spark/kyuubi-spark-connector-tpch/src/test/scala/org/apache/kyuubi/spark/connector/tpch/TPCHCatalogSuite.scala
index ee817ecae30..14415141e63 100644
--- a/extensions/spark/kyuubi-spark-connector-tpch/src/test/scala/org/apache/kyuubi/spark/connector/tpch/TPCHCatalogSuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-tpch/src/test/scala/org/apache/kyuubi/spark/connector/tpch/TPCHCatalogSuite.scala
@@ -23,7 +23,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap
import org.apache.kyuubi.KyuubiFunSuite
import org.apache.kyuubi.spark.connector.common.LocalSparkSession.withSparkSession
-import org.apache.kyuubi.spark.connector.common.SparkUtils
+import org.apache.kyuubi.spark.connector.common.SparkUtils.SPARK_RUNTIME_VERSION
class TPCHCatalogSuite extends KyuubiFunSuite {
@@ -35,7 +35,7 @@ class TPCHCatalogSuite extends KyuubiFunSuite {
.set("spark.sql.catalog.tpch", classOf[TPCHCatalog].getName)
.set("spark.sql.cbo.enabled", "true")
.set("spark.sql.cbo.planStats.enabled", "true")
- withSparkSession(SparkSession.builder.config(sparkConf).getOrCreate()) { spark =>
+ withSparkSession(SparkSession.builder.config(sparkConf).getOrCreate()) { _ =>
val catalog = new TPCHCatalog
val catalogName = "test"
catalog.initialize(catalogName, CaseInsensitiveStringMap.empty())
@@ -130,7 +130,7 @@ class TPCHCatalogSuite extends KyuubiFunSuite {
val stats = spark.table(tableName).queryExecution.analyzed.stats
assert(stats.sizeInBytes == sizeInBytes)
// stats.rowCount only has value after SPARK-33954
- if (SparkUtils.isSparkVersionAtLeast("3.2")) {
+ if (SPARK_RUNTIME_VERSION >= "3.2") {
assert(stats.rowCount.contains(rowCount), tableName)
}
}
@@ -158,7 +158,8 @@ class TPCHCatalogSuite extends KyuubiFunSuite {
val exception = intercept[AnalysisException] {
spark.table("tpch.sf1.nonexistent_table")
}
- assert(exception.message === "Table or view not found: tpch.sf1.nonexistent_table")
+ assert(exception.message.contains("Table or view not found")
+ || exception.message.contains("TABLE_OR_VIEW_NOT_FOUND"))
}
}
}
diff --git a/extensions/spark/kyuubi-spark-connector-tpch/src/test/scala/org/apache/kyuubi/spark/connector/tpch/TPCHQuerySuite.scala b/extensions/spark/kyuubi-spark-connector-tpch/src/test/scala/org/apache/kyuubi/spark/connector/tpch/TPCHQuerySuite.scala
index efeaeb36c6e..a409a5fe927 100644
--- a/extensions/spark/kyuubi-spark-connector-tpch/src/test/scala/org/apache/kyuubi/spark/connector/tpch/TPCHQuerySuite.scala
+++ b/extensions/spark/kyuubi-spark-connector-tpch/src/test/scala/org/apache/kyuubi/spark/connector/tpch/TPCHQuerySuite.scala
@@ -28,33 +28,24 @@ import org.apache.kyuubi.{KyuubiFunSuite, Utils}
import org.apache.kyuubi.spark.connector.common.GoldenFileUtils._
import org.apache.kyuubi.spark.connector.common.LocalSparkSession.withSparkSession
-// scalastyle:off line.size.limit
/**
* To run this test suite:
* {{{
- * build/mvn clean install \
- * -pl extensions/spark/kyuubi-spark-connector-tpch -am \
- * -Dmaven.plugin.scalatest.exclude.tags="" \
- * -Dtest=none -DwildcardSuites=org.apache.kyuubi.spark.connector.tpch.TPCHQuerySuite
+ * KYUUBI_UPDATE=0 dev/gen/gen_tpcdh_queries.sh
* }}}
*
* To re-generate golden files for this suite:
* {{{
- * KYUUBI_UPDATE=1 build/mvn clean install \
- * -pl extensions/spark/kyuubi-spark-connector-tpch -am \
- * -Dmaven.plugin.scalatest.exclude.tags="" \
- * -Dtest=none -DwildcardSuites=org.apache.kyuubi.spark.connector.tpch.TPCHQuerySuite
+ * dev/gen/gen_tpcdh_queries.sh
* }}}
*/
-// scalastyle:on line.size.limit
-
@Slow
class TPCHQuerySuite extends KyuubiFunSuite {
val queries: Set[String] = (1 to 22).map(i => s"q$i").toSet
test("run query on tiny") {
- val viewSuffix = "view";
+ val viewSuffix = "view"
val sparkConf = new SparkConf().setMaster("local[*]")
.set("spark.ui.enabled", "false")
.set("spark.sql.catalogImplementation", "in-memory")
diff --git a/extensions/spark/kyuubi-spark-lineage/README.md b/extensions/spark/kyuubi-spark-lineage/README.md
index 34f2733b4f6..1c42d3736e3 100644
--- a/extensions/spark/kyuubi-spark-lineage/README.md
+++ b/extensions/spark/kyuubi-spark-lineage/README.md
@@ -26,7 +26,7 @@
## Build
```shell
-build/mvn clean package -pl :kyuubi-spark-lineage_2.12 -Dspark.version=3.2.1
+build/mvn clean package -DskipTests -pl :kyuubi-spark-lineage_2.12 -am -Dspark.version=3.2.1
```
### Supported Apache Spark Versions
@@ -34,6 +34,7 @@ build/mvn clean package -pl :kyuubi-spark-lineage_2.12 -Dspark.version=3.2.1
`-Dspark.version=`
- [x] master
+- [ ] 3.4.x
- [x] 3.3.x (default)
- [x] 3.2.x
- [x] 3.1.x
diff --git a/extensions/spark/kyuubi-spark-lineage/pom.xml b/extensions/spark/kyuubi-spark-lineage/pom.xml
index bc13480d77c..270bf4d0453 100644
--- a/extensions/spark/kyuubi-spark-lineage/pom.xml
+++ b/extensions/spark/kyuubi-spark-lineage/pom.xml
@@ -21,16 +21,21 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../../pom.xml
- kyuubi-spark-lineage_2.12
+ kyuubi-spark-lineage_${scala.binary.version}jarKyuubi Dev Spark Lineage Extensionhttps://kyuubi.apache.org/
+
+ org.apache.kyuubi
+ kyuubi-util-scala_${scala.binary.version}
+ ${project.version}
+ org.apache.sparkspark-sql_${scala.binary.version}
@@ -54,7 +59,85 @@
commons-collectionscommons-collections
- test
+ provided
+
+
+
+ com.google.guava
+ guava
+ provided
+
+
+
+ com.fasterxml.jackson.core
+ jackson-annotations
+ provided
+
+
+
+ com.fasterxml.jackson.core
+ jackson-core
+ provided
+
+
+
+ com.fasterxml.jackson.core
+ jackson-databind
+ provided
+
+
+
+ org.apache.httpcomponents
+ httpclient
+ provided
+
+
+
+ commons-lang
+ commons-lang
+ provided
+
+
+
+ org.apache.commons
+ commons-lang3
+ provided
+
+
+
+ org.apache.atlas
+ atlas-client-v2
+ ${atlas.version}
+
+
+ org.slf4j
+ slf4j-log4j12
+
+
+ org.slf4j
+ slf4j-api
+
+
+ org.slf4j
+ jul-to-slf4j
+
+
+ commons-logging
+ commons-logging
+
+
+ org.apache.hadoop
+ hadoop-common
+
+
+ org.springframework
+ spring-context
+
+
+ org.apache.commons
+ commons-text
+
+
@@ -84,11 +167,9 @@
spark-hive_${scala.binary.version}test
-
-
${project.basedir}/src/test/resources
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageDispatcher.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageDispatcher.scala
index 8f5dc0d9e61..b993f14282a 100644
--- a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageDispatcher.scala
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageDispatcher.scala
@@ -20,6 +20,7 @@ package org.apache.kyuubi.plugin.lineage
import org.apache.spark.sql.execution.QueryExecution
import org.apache.kyuubi.plugin.lineage.dispatcher.{KyuubiEventDispatcher, SparkEventDispatcher}
+import org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasLineageDispatcher
trait LineageDispatcher {
@@ -35,6 +36,7 @@ object LineageDispatcher {
LineageDispatcherType.withName(dispatcherType) match {
case LineageDispatcherType.SPARK_EVENT => new SparkEventDispatcher()
case LineageDispatcherType.KYUUBI_EVENT => new KyuubiEventDispatcher()
+ case LineageDispatcherType.ATLAS => new AtlasLineageDispatcher()
case _ => throw new UnsupportedOperationException(
s"Unsupported lineage dispatcher: $dispatcherType.")
}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageDispatcherType.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageDispatcherType.scala
index d6afea15233..8e07f6d7769 100644
--- a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageDispatcherType.scala
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageDispatcherType.scala
@@ -20,5 +20,5 @@ package org.apache.kyuubi.plugin.lineage
object LineageDispatcherType extends Enumeration {
type LineageDispatcherType = Value
- val SPARK_EVENT, KYUUBI_EVENT = Value
+ val SPARK_EVENT, KYUUBI_EVENT, ATLAS = Value
}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageParserProvider.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageParserProvider.scala
new file mode 100644
index 00000000000..665efef100e
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/LineageParserProvider.scala
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.lineage
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+import org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParseHelper
+object LineageParserProvider {
+ def parse(spark: SparkSession, plan: LogicalPlan): Lineage = {
+ SparkSQLLineageParseHelper(spark).parse(plan)
+ }
+}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasClient.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasClient.scala
new file mode 100644
index 00000000000..15b12718284
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasClient.scala
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.lineage.dispatcher.atlas
+
+import java.util.Locale
+
+import com.google.common.annotations.VisibleForTesting
+import org.apache.atlas.AtlasClientV2
+import org.apache.atlas.model.instance.AtlasEntity
+import org.apache.atlas.model.instance.AtlasEntity.AtlasEntitiesWithExtInfo
+import org.apache.commons.lang3.StringUtils
+import org.apache.hadoop.util.ShutdownHookManager
+
+import org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasClientConf._
+
+trait AtlasClient extends AutoCloseable {
+ def send(entities: Seq[AtlasEntity]): Unit
+}
+
+class AtlasRestClient(conf: AtlasClientConf) extends AtlasClient {
+
+ private val atlasClient: AtlasClientV2 = {
+ val serverUrl = conf.get(ATLAS_REST_ENDPOINT).split(",")
+ val username = conf.get(CLIENT_USERNAME)
+ val password = conf.get(CLIENT_PASSWORD)
+ if (StringUtils.isNoneBlank(username, password)) {
+ new AtlasClientV2(serverUrl, Array(username, password))
+ } else {
+ new AtlasClientV2(serverUrl: _*)
+ }
+ }
+
+ override def send(entities: Seq[AtlasEntity]): Unit = {
+ val entitiesWithExtInfo = new AtlasEntitiesWithExtInfo()
+ entities.foreach(entitiesWithExtInfo.addEntity)
+ atlasClient.createEntities(entitiesWithExtInfo)
+ }
+
+ override def close(): Unit = {
+ if (atlasClient != null) {
+ atlasClient.close()
+ }
+ }
+}
+
+object AtlasClient {
+
+ @volatile private var client: AtlasClient = _
+
+ def getClient(): AtlasClient = {
+ if (client == null) {
+ AtlasClient.synchronized {
+ if (client == null) {
+ val clientConf = AtlasClientConf.getConf()
+ client = clientConf.get(CLIENT_TYPE).toLowerCase(Locale.ROOT) match {
+ case "rest" => new AtlasRestClient(clientConf)
+ case unknown => throw new RuntimeException(s"Unsupported client type: $unknown.")
+ }
+ registerCleanupShutdownHook(client)
+ }
+ }
+ }
+ client
+ }
+
+ private def registerCleanupShutdownHook(client: AtlasClient): Unit = {
+ ShutdownHookManager.get.addShutdownHook(
+ () => {
+ if (client != null) {
+ client.close()
+ }
+ },
+ Integer.MAX_VALUE)
+ }
+
+ @VisibleForTesting
+ private[dispatcher] def setClient(newClient: AtlasClient): Unit = {
+ client = newClient
+ }
+
+}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasClientConf.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasClientConf.scala
new file mode 100644
index 00000000000..03b1a83e0e3
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasClientConf.scala
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.lineage.dispatcher.atlas
+
+import org.apache.atlas.ApplicationProperties
+import org.apache.commons.configuration.Configuration
+import org.apache.spark.kyuubi.lineage.SparkContextHelper
+
+class AtlasClientConf(configuration: Configuration) {
+
+ def get(entry: ConfigEntry): String = {
+ configuration.getProperty(entry.key) match {
+ case s: String => s
+ case l: List[_] => l.mkString(",")
+ case o if o != null => o.toString
+ case _ => entry.defaultValue
+ }
+ }
+
+}
+
+object AtlasClientConf {
+
+ private lazy val clientConf: AtlasClientConf = {
+ val conf = ApplicationProperties.get()
+ SparkContextHelper.globalSparkContext.getConf.getAllWithPrefix("spark.atlas.")
+ .foreach { case (k, v) => conf.setProperty(s"atlas.$k", v) }
+ new AtlasClientConf(conf)
+ }
+
+ def getConf(): AtlasClientConf = clientConf
+
+ val ATLAS_REST_ENDPOINT = ConfigEntry("atlas.rest.address", "http://localhost:21000")
+
+ val CLIENT_TYPE = ConfigEntry("atlas.client.type", "rest")
+ val CLIENT_USERNAME = ConfigEntry("atlas.client.username", null)
+ val CLIENT_PASSWORD = ConfigEntry("atlas.client.password", null)
+
+ val CLUSTER_NAME = ConfigEntry("atlas.cluster.name", "primary")
+
+ val COLUMN_LINEAGE_ENABLED = ConfigEntry("atlas.hook.spark.column.lineage.enabled", "true")
+}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala
new file mode 100644
index 00000000000..cfa19b7aa87
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.lineage.dispatcher.atlas
+
+import scala.collection.JavaConverters._
+
+import org.apache.atlas.model.instance.{AtlasEntity, AtlasObjectId, AtlasRelatedObjectId}
+import org.apache.spark.kyuubi.lineage.{LineageConf, SparkContextHelper}
+import org.apache.spark.sql.execution.QueryExecution
+
+import org.apache.kyuubi.plugin.lineage.Lineage
+import org.apache.kyuubi.plugin.lineage.helper.SparkListenerHelper
+
+/**
+ * The helpers for Atlas spark entities from Lineage.
+ * The Atlas spark models refer to :
+ * https://github.com/apache/atlas/blob/master/addons/models/1000-Hadoop/1100-spark_model.json
+ */
+object AtlasEntityHelper {
+
+ /**
+ * Generate `spark_process` Atlas Entity from Lineage.
+ * @param qe
+ * @param lineage
+ * @return
+ */
+ def processEntity(qe: QueryExecution, lineage: Lineage): AtlasEntity = {
+ val entity = new AtlasEntity(PROCESS_TYPE)
+
+ val appId = SparkContextHelper.globalSparkContext.applicationId
+ val appName = SparkContextHelper.globalSparkContext.appName match {
+ case "Spark shell" => s"Spark Job $appId"
+ case default => s"$default $appId"
+ }
+
+ entity.setAttribute("qualifiedName", appId)
+ entity.setAttribute("name", appName)
+ entity.setAttribute("currUser", SparkListenerHelper.currentUser)
+ SparkListenerHelper.sessionUser.foreach(entity.setAttribute("remoteUser", _))
+ entity.setAttribute("executionId", qe.id)
+ entity.setAttribute("details", qe.toString())
+ entity.setAttribute("sparkPlanDescription", qe.sparkPlan.toString())
+
+ // TODO add entity type instead of parsing from string
+ val inputs = lineage.inputTables.flatMap(tableObjectId).map { objId =>
+ relatedObjectId(objId, RELATIONSHIP_DATASET_PROCESS_INPUTS)
+ }
+ val outputs = lineage.outputTables.flatMap(tableObjectId).map { objId =>
+ relatedObjectId(objId, RELATIONSHIP_PROCESS_DATASET_OUTPUTS)
+ }
+
+ entity.setRelationshipAttribute("inputs", inputs.asJava)
+ entity.setRelationshipAttribute("outputs", outputs.asJava)
+
+ entity
+ }
+
+ /**
+ * Generate `spark_column_lineage` Atlas Entity from Lineage.
+ * @param processEntity
+ * @param lineage
+ * @return
+ */
+ def columnLineageEntities(processEntity: AtlasEntity, lineage: Lineage): Seq[AtlasEntity] = {
+ lineage.columnLineage.flatMap(columnLineage => {
+ val inputs = columnLineage.originalColumns.flatMap(columnObjectId).map { objId =>
+ relatedObjectId(objId, RELATIONSHIP_DATASET_PROCESS_INPUTS)
+ }
+ val outputs = Option(columnLineage.column).flatMap(columnObjectId).map { objId =>
+ relatedObjectId(objId, RELATIONSHIP_PROCESS_DATASET_OUTPUTS)
+ }.toSeq
+
+ if (inputs.nonEmpty && outputs.nonEmpty) {
+ val entity = new AtlasEntity(COLUMN_LINEAGE_TYPE)
+ val outputColumnName = buildColumnQualifiedName(columnLineage.column).get
+ val qualifiedName =
+ s"${processEntity.getAttribute("qualifiedName")}:${outputColumnName}"
+ entity.setAttribute("qualifiedName", qualifiedName)
+ entity.setAttribute("name", qualifiedName)
+ entity.setRelationshipAttribute("inputs", inputs.asJava)
+ entity.setRelationshipAttribute("outputs", outputs.asJava)
+ entity.setRelationshipAttribute(
+ "process",
+ relatedObjectId(objectId(processEntity), RELATIONSHIP_SPARK_PROCESS_COLUMN_LINEAGE))
+ Some(entity)
+ } else {
+ None
+ }
+ })
+ }
+
+ def tableObjectId(tableName: String): Option[AtlasObjectId] = {
+ buildTableQualifiedName(tableName)
+ .map(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", _))
+ }
+
+ def buildTableQualifiedName(tableName: String): Option[String] = {
+ val defaultCatalog = LineageConf.DEFAULT_CATALOG
+ tableName.split('.') match {
+ case Array(`defaultCatalog`, db, table) =>
+ Some(s"${db.toLowerCase}.${table.toLowerCase}@$cluster")
+ case _ =>
+ None
+ }
+ }
+
+ def columnObjectId(columnName: String): Option[AtlasObjectId] = {
+ buildColumnQualifiedName(columnName)
+ .map(new AtlasObjectId(HIVE_COLUMN_TYPE, "qualifiedName", _))
+ }
+
+ def buildColumnQualifiedName(columnName: String): Option[String] = {
+ val defaultCatalog = LineageConf.DEFAULT_CATALOG
+ columnName.split('.') match {
+ case Array(`defaultCatalog`, db, table, column) =>
+ Some(s"${db.toLowerCase}.${table.toLowerCase}.${column.toLowerCase}@$cluster")
+ case _ =>
+ None
+ }
+ }
+
+ def objectId(entity: AtlasEntity): AtlasObjectId = {
+ val objId = new AtlasObjectId(entity.getGuid, entity.getTypeName)
+ objId.setUniqueAttributes(Map("qualifiedName" -> entity.getAttribute("qualifiedName")).asJava)
+ objId
+ }
+
+ def relatedObjectId(objectId: AtlasObjectId, relationshipType: String): AtlasRelatedObjectId = {
+ new AtlasRelatedObjectId(objectId, relationshipType)
+ }
+
+ lazy val cluster = AtlasClientConf.getConf().get(AtlasClientConf.CLUSTER_NAME)
+ lazy val columnLineageEnabled =
+ AtlasClientConf.getConf().get(AtlasClientConf.COLUMN_LINEAGE_ENABLED).toBoolean
+
+ val HIVE_TABLE_TYPE = "hive_table"
+ val HIVE_COLUMN_TYPE = "hive_column"
+ val PROCESS_TYPE = "spark_process"
+ val COLUMN_LINEAGE_TYPE = "spark_column_lineage"
+ val RELATIONSHIP_DATASET_PROCESS_INPUTS = "dataset_process_inputs"
+ val RELATIONSHIP_PROCESS_DATASET_OUTPUTS = "process_dataset_outputs"
+ val RELATIONSHIP_SPARK_PROCESS_COLUMN_LINEAGE = "spark_process_column_lineages"
+
+}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasLineageDispatcher.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasLineageDispatcher.scala
new file mode 100644
index 00000000000..c66b5110746
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasLineageDispatcher.scala
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.lineage.dispatcher.atlas
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.execution.QueryExecution
+
+import org.apache.kyuubi.plugin.lineage.{Lineage, LineageDispatcher}
+import org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasEntityHelper.columnLineageEnabled
+
+class AtlasLineageDispatcher extends LineageDispatcher with Logging {
+
+ override def send(qe: QueryExecution, lineageOpt: Option[Lineage]): Unit = {
+ try {
+ lineageOpt.filter(l => l.inputTables.nonEmpty || l.outputTables.nonEmpty).foreach(lineage => {
+ val processEntity = AtlasEntityHelper.processEntity(qe, lineage)
+ val columnLineageEntities = if (lineage.columnLineage.nonEmpty && columnLineageEnabled) {
+ AtlasEntityHelper.columnLineageEntities(processEntity, lineage)
+ } else {
+ Seq.empty
+ }
+ AtlasClient.getClient().send(processEntity +: columnLineageEntities)
+ })
+ } catch {
+ case t: Throwable =>
+ logWarning("Send lineage to atlas failed.", t)
+ }
+ }
+
+ override def onFailure(qe: QueryExecution, exception: Exception): Unit = {
+ // ignore
+ }
+
+}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/ConfigEntry.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/ConfigEntry.scala
new file mode 100644
index 00000000000..3f9d9831de8
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/ConfigEntry.scala
@@ -0,0 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.lineage.dispatcher.atlas
+
+case class ConfigEntry(key: String, defaultValue: String)
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SemanticVersion.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SemanticVersion.scala
deleted file mode 100644
index a4a8b2e0e2f..00000000000
--- a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SemanticVersion.scala
+++ /dev/null
@@ -1,74 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.plugin.lineage.helper
-
-/**
- * Encapsulate a component (Kyuubi/Spark/Hive/Flink etc.) version
- * for the convenience of version checks.
- */
-case class SemanticVersion(majorVersion: Int, minorVersion: Int) {
-
- def isVersionAtMost(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- (runtimeMajor < targetMajor) || {
- runtimeMajor == targetMajor && runtimeMinor <= targetMinor
- })
- }
-
- def isVersionAtLeast(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- (runtimeMajor > targetMajor) || {
- runtimeMajor == targetMajor && runtimeMinor >= targetMinor
- })
- }
-
- def isVersionEqualTo(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- runtimeMajor == targetMajor && runtimeMinor == targetMinor)
- }
-
- def compareVersion(
- targetVersionString: String,
- callback: (Int, Int, Int, Int) => Boolean): Boolean = {
- val targetVersion = SemanticVersion(targetVersionString)
- val targetMajor = targetVersion.majorVersion
- val targetMinor = targetVersion.minorVersion
- callback(targetMajor, targetMinor, this.majorVersion, this.minorVersion)
- }
-
- override def toString: String = s"$majorVersion.$minorVersion"
-}
-
-object SemanticVersion {
-
- def apply(versionString: String): SemanticVersion = {
- """^(\d+)\.(\d+)(\..*)?$""".r.findFirstMatchIn(versionString) match {
- case Some(m) =>
- SemanticVersion(m.group(1).toInt, m.group(2).toInt)
- case None =>
- throw new IllegalArgumentException(s"Tried to parse '$versionString' as a project" +
- s" version string, but it could not find the major and minor version numbers.")
- }
- }
-}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkListenerHelper.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkListenerHelper.scala
index f2808a4e9b9..6093e866080 100644
--- a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkListenerHelper.scala
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkListenerHelper.scala
@@ -17,25 +17,20 @@
package org.apache.kyuubi.plugin.lineage.helper
+import org.apache.hadoop.security.UserGroupInformation
import org.apache.spark.SPARK_VERSION
+import org.apache.spark.kyuubi.lineage.SparkContextHelper
+
+import org.apache.kyuubi.util.SemanticVersion
object SparkListenerHelper {
- lazy val sparkMajorMinorVersion: (Int, Int) = {
- val runtimeSparkVer = org.apache.spark.SPARK_VERSION
- val runtimeVersion = SemanticVersion(runtimeSparkVer)
- (runtimeVersion.majorVersion, runtimeVersion.minorVersion)
- }
+ lazy val SPARK_RUNTIME_VERSION: SemanticVersion = SemanticVersion(SPARK_VERSION)
- def isSparkVersionAtMost(targetVersionString: String): Boolean = {
- SemanticVersion(SPARK_VERSION).isVersionAtMost(targetVersionString)
- }
+ def currentUser: String = UserGroupInformation.getCurrentUser.getShortUserName
- def isSparkVersionAtLeast(targetVersionString: String): Boolean = {
- SemanticVersion(SPARK_VERSION).isVersionAtLeast(targetVersionString)
- }
+ def sessionUser: Option[String] =
+ Option(SparkContextHelper.globalSparkContext.getLocalProperty(KYUUBI_SESSION_USER))
- def isSparkVersionEqualTo(targetVersionString: String): Boolean = {
- SemanticVersion(SPARK_VERSION).isVersionEqualTo(targetVersionString)
- }
+ final val KYUUBI_SESSION_USER = "kyuubi.session.user"
}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala
index f060cc99422..27311146454 100644
--- a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala
@@ -18,7 +18,7 @@
package org.apache.kyuubi.plugin.lineage.helper
import scala.collection.immutable.ListMap
-import scala.util.{Failure, Success, Try}
+import scala.util.Try
import org.apache.spark.internal.Logging
import org.apache.spark.kyuubi.lineage.{LineageConf, SparkContextHelper}
@@ -37,7 +37,8 @@ import org.apache.spark.sql.execution.datasources.LogicalRelation
import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2Relation, DataSourceV2ScanRelation}
import org.apache.kyuubi.plugin.lineage.Lineage
-import org.apache.kyuubi.plugin.lineage.helper.SparkListenerHelper.isSparkVersionAtMost
+import org.apache.kyuubi.plugin.lineage.helper.SparkListenerHelper.SPARK_RUNTIME_VERSION
+import org.apache.kyuubi.util.reflect.ReflectUtils._
trait LineageParser {
def sparkSession: SparkSession
@@ -52,7 +53,7 @@ trait LineageParser {
val columnsLineage =
extractColumnsLineage(plan, ListMap[Attribute, AttributeSet]()).toList.collect {
case (k, attrs) =>
- k.name -> attrs.map(_.qualifiedName).toSet
+ k.name -> attrs.map(attr => (attr.qualifier :+ attr.name).mkString(".")).toSet
}
val (inputTables, outputTables) = columnsLineage.foldLeft((List[String](), List[String]())) {
case ((inputs, outputs), (out, in)) =>
@@ -189,40 +190,41 @@ trait LineageParser {
plan match {
// For command
case p if p.nodeName == "CommandResult" =>
- val commandPlan = getPlanField[LogicalPlan]("commandLogicalPlan", plan)
+ val commandPlan = getField[LogicalPlan](plan, "commandLogicalPlan")
extractColumnsLineage(commandPlan, parentColumnsLineage)
case p if p.nodeName == "AlterViewAsCommand" =>
val query =
- if (isSparkVersionAtMost("3.1")) {
+ if (SPARK_RUNTIME_VERSION <= "3.1") {
sparkSession.sessionState.analyzer.execute(getQuery(plan))
} else {
getQuery(plan)
}
- val view = getPlanField[TableIdentifier]("name", plan).unquotedString
+ val view = getV1TableName(getField[TableIdentifier](plan, "name").unquotedString)
extractColumnsLineage(query, parentColumnsLineage).map { case (k, v) =>
k.withName(s"$view.${k.name}") -> v
}
case p
if p.nodeName == "CreateViewCommand"
- && getPlanField[ViewType]("viewType", plan) == PersistedView =>
- val view = getPlanField[TableIdentifier]("name", plan).unquotedString
+ && getField[ViewType](plan, "viewType") == PersistedView =>
+ val view = getV1TableName(getField[TableIdentifier](plan, "name").unquotedString)
val outputCols =
- getPlanField[Seq[(String, Option[String])]]("userSpecifiedColumns", plan).map(_._1)
+ getField[Seq[(String, Option[String])]](plan, "userSpecifiedColumns").map(_._1)
val query =
- if (isSparkVersionAtMost("3.1")) {
- sparkSession.sessionState.analyzer.execute(getPlanField[LogicalPlan]("child", plan))
+ if (SPARK_RUNTIME_VERSION <= "3.1") {
+ sparkSession.sessionState.analyzer.execute(getField[LogicalPlan](plan, "child"))
} else {
- getPlanField[LogicalPlan]("plan", plan)
+ getField[LogicalPlan](plan, "plan")
}
- extractColumnsLineage(query, parentColumnsLineage).zipWithIndex.map {
+ val lineages = extractColumnsLineage(query, parentColumnsLineage).zipWithIndex.map {
case ((k, v), i) if outputCols.nonEmpty => k.withName(s"$view.${outputCols(i)}") -> v
case ((k, v), _) => k.withName(s"$view.${k.name}") -> v
- }
+ }.toSeq
+ ListMap[Attribute, AttributeSet](lineages: _*)
case p if p.nodeName == "CreateDataSourceTableAsSelectCommand" =>
- val table = getPlanField[CatalogTable]("table", plan).qualifiedName
+ val table = getV1TableName(getField[CatalogTable](plan, "table").qualifiedName)
extractColumnsLineage(getQuery(plan), parentColumnsLineage).map { case (k, v) =>
k.withName(s"$table.${k.name}") -> v
}
@@ -230,7 +232,7 @@ trait LineageParser {
case p
if p.nodeName == "CreateHiveTableAsSelectCommand" ||
p.nodeName == "OptimizedCreateHiveTableAsSelectCommand" =>
- val table = getPlanField[CatalogTable]("tableDesc", plan).qualifiedName
+ val table = getV1TableName(getField[CatalogTable](plan, "tableDesc").qualifiedName)
extractColumnsLineage(getQuery(plan), parentColumnsLineage).map { case (k, v) =>
k.withName(s"$table.${k.name}") -> v
}
@@ -239,17 +241,17 @@ trait LineageParser {
if p.nodeName == "CreateTableAsSelect" ||
p.nodeName == "ReplaceTableAsSelect" =>
val (table, namespace, catalog) =
- if (isSparkVersionAtMost("3.2")) {
+ if (SPARK_RUNTIME_VERSION <= "3.2") {
(
- getPlanField[Identifier]("tableName", plan).name,
- getPlanField[Identifier]("tableName", plan).namespace.mkString("."),
- getPlanField[TableCatalog]("catalog", plan).name())
+ getField[Identifier](plan, "tableName").name,
+ getField[Identifier](plan, "tableName").namespace.mkString("."),
+ getField[TableCatalog](plan, "catalog").name())
} else {
(
- getPlanMethod[Identifier]("tableName", plan).name(),
- getPlanMethod[Identifier]("tableName", plan).namespace().mkString("."),
- getCurrentPlanField[CatalogPlugin](
- getPlanMethod[LogicalPlan]("left", plan),
+ invokeAs[Identifier](plan, "tableName").name(),
+ invokeAs[Identifier](plan, "tableName").namespace().mkString("."),
+ getField[CatalogPlugin](
+ invokeAs[LogicalPlan](plan, "name"),
"catalog").name())
}
extractColumnsLineage(getQuery(plan), parentColumnsLineage).map { case (k, v) =>
@@ -257,8 +259,9 @@ trait LineageParser {
}
case p if p.nodeName == "InsertIntoDataSourceCommand" =>
- val logicalRelation = getPlanField[LogicalRelation]("logicalRelation", plan)
- val table = logicalRelation.catalogTable.map(_.qualifiedName).getOrElse("")
+ val logicalRelation = getField[LogicalRelation](plan, "logicalRelation")
+ val table = logicalRelation
+ .catalogTable.map(t => getV1TableName(t.qualifiedName)).getOrElse("")
extractColumnsLineage(getQuery(plan), parentColumnsLineage).map {
case (k, v) if table.nonEmpty =>
k.withName(s"$table.${k.name}") -> v
@@ -266,8 +269,9 @@ trait LineageParser {
case p if p.nodeName == "InsertIntoHadoopFsRelationCommand" =>
val table =
- getPlanField[Option[CatalogTable]]("catalogTable", plan).map(_.qualifiedName).getOrElse(
- "")
+ getField[Option[CatalogTable]](plan, "catalogTable")
+ .map(t => getV1TableName(t.qualifiedName))
+ .getOrElse("")
extractColumnsLineage(getQuery(plan), parentColumnsLineage).map {
case (k, v) if table.nonEmpty =>
k.withName(s"$table.${k.name}") -> v
@@ -277,15 +281,15 @@ trait LineageParser {
if p.nodeName == "InsertIntoDataSourceDirCommand" ||
p.nodeName == "InsertIntoHiveDirCommand" =>
val dir =
- getPlanField[CatalogStorageFormat]("storage", plan).locationUri.map(_.toString).getOrElse(
- "")
+ getField[CatalogStorageFormat](plan, "storage").locationUri.map(_.toString)
+ .getOrElse("")
extractColumnsLineage(getQuery(plan), parentColumnsLineage).map {
case (k, v) if dir.nonEmpty =>
k.withName(s"`$dir`.${k.name}") -> v
}
case p if p.nodeName == "InsertIntoHiveTable" =>
- val table = getPlanField[CatalogTable]("table", plan).qualifiedName
+ val table = getV1TableName(getField[CatalogTable](plan, "table").qualifiedName)
extractColumnsLineage(getQuery(plan), parentColumnsLineage).map { case (k, v) =>
k.withName(s"$table.${k.name}") -> v
}
@@ -297,14 +301,14 @@ trait LineageParser {
if p.nodeName == "AppendData"
|| p.nodeName == "OverwriteByExpression"
|| p.nodeName == "OverwritePartitionsDynamic" =>
- val table = getPlanField[NamedRelation]("table", plan).name
+ val table = getV2TableName(getField[NamedRelation](plan, "table"))
extractColumnsLineage(getQuery(plan), parentColumnsLineage).map { case (k, v) =>
k.withName(s"$table.${k.name}") -> v
}
case p if p.nodeName == "MergeIntoTable" =>
- val matchedActions = getPlanField[Seq[MergeAction]]("matchedActions", plan)
- val notMatchedActions = getPlanField[Seq[MergeAction]]("notMatchedActions", plan)
+ val matchedActions = getField[Seq[MergeAction]](plan, "matchedActions")
+ val notMatchedActions = getField[Seq[MergeAction]](plan, "notMatchedActions")
val allAssignments = (matchedActions ++ notMatchedActions).collect {
case UpdateAction(_, assignments) => assignments
case InsertAction(_, assignments) => assignments
@@ -314,14 +318,15 @@ trait LineageParser {
assignment.key.asInstanceOf[Attribute],
assignment.value.references)
}: _*)
- val targetTable = getPlanField[LogicalPlan]("targetTable", plan)
- val sourceTable = getPlanField[LogicalPlan]("sourceTable", plan)
+ val targetTable = getField[LogicalPlan](plan, "targetTable")
+ val sourceTable = getField[LogicalPlan](plan, "sourceTable")
val targetColumnsLineage = extractColumnsLineage(
targetTable,
nextColumnsLlineage.map { case (k, _) => (k, AttributeSet(k)) })
val sourceColumnsLineage = extractColumnsLineage(sourceTable, nextColumnsLlineage)
val targetColumnsWithTargetTable = targetColumnsLineage.values.flatten.map { column =>
- column.withName(s"${column.qualifiedName}")
+ val unquotedQualifiedName = (column.qualifier :+ column.name).mkString(".")
+ column.withName(unquotedQualifiedName)
}
ListMap(targetColumnsWithTargetTable.zip(sourceColumnsLineage.values).toSeq: _*)
@@ -408,22 +413,22 @@ trait LineageParser {
joinColumnsLineage(parentColumnsLineage, childrenColumnsLineage)
case p: LogicalRelation if p.catalogTable.nonEmpty =>
- val tableName = p.catalogTable.get.qualifiedName
+ val tableName = getV1TableName(p.catalogTable.get.qualifiedName)
joinRelationColumnLineage(parentColumnsLineage, p.output, Seq(tableName))
case p: HiveTableRelation =>
- val tableName = p.tableMeta.qualifiedName
+ val tableName = getV1TableName(p.tableMeta.qualifiedName)
joinRelationColumnLineage(parentColumnsLineage, p.output, Seq(tableName))
case p: DataSourceV2ScanRelation =>
- val tableName = p.name
+ val tableName = getV2TableName(p)
joinRelationColumnLineage(parentColumnsLineage, p.output, Seq(tableName))
// For creating the view from v2 table, the logical plan of table will
// be the `DataSourceV2Relation` not the `DataSourceV2ScanRelation`.
// because the view from the table is not going to read it.
case p: DataSourceV2Relation =>
- val tableName = p.name
+ val tableName = getV2TableName(p)
joinRelationColumnLineage(parentColumnsLineage, p.output, Seq(tableName))
case p: LocalRelation =>
@@ -444,7 +449,7 @@ trait LineageParser {
case p: View =>
if (!p.isTempView && SparkContextHelper.getConf(
LineageConf.SKIP_PARSING_PERMANENT_VIEW_ENABLED)) {
- val viewName = p.desc.qualifiedName
+ val viewName = getV1TableName(p.desc.qualifiedName)
joinRelationColumnLineage(parentColumnsLineage, p.output, Seq(viewName))
} else {
val viewColumnsLineage =
@@ -474,47 +479,32 @@ trait LineageParser {
}
}
- private def getPlanField[T](field: String, plan: LogicalPlan): T = {
- getFieldVal[T](plan, field)
- }
-
- private def getCurrentPlanField[T](curPlan: LogicalPlan, field: String): T = {
- getFieldVal[T](curPlan, field)
- }
-
- private def getPlanMethod[T](name: String, plan: LogicalPlan): T = {
- getMethod[T](plan, name)
- }
-
- private def getQuery(plan: LogicalPlan): LogicalPlan = {
- getPlanField[LogicalPlan]("query", plan)
- }
+ private def getQuery(plan: LogicalPlan): LogicalPlan = getField[LogicalPlan](plan, "query")
- private def getFieldVal[T](o: Any, name: String): T = {
- Try {
- val field = o.getClass.getDeclaredField(name)
- field.setAccessible(true)
- field.get(o)
- } match {
- case Success(value) => value.asInstanceOf[T]
- case Failure(e) =>
- val candidates = o.getClass.getDeclaredFields.map(_.getName).mkString("[", ",", "]")
- throw new RuntimeException(s"$name not in $candidates", e)
+ private def getV2TableName(plan: NamedRelation): String = {
+ plan match {
+ case relation: DataSourceV2ScanRelation =>
+ val catalog = relation.relation.catalog.map(_.name()).getOrElse(LineageConf.DEFAULT_CATALOG)
+ val database = relation.relation.identifier.get.namespace().mkString(".")
+ val table = relation.relation.identifier.get.name()
+ s"$catalog.$database.$table"
+ case relation: DataSourceV2Relation =>
+ val catalog = relation.catalog.map(_.name()).getOrElse(LineageConf.DEFAULT_CATALOG)
+ val database = relation.identifier.get.namespace().mkString(".")
+ val table = relation.identifier.get.name()
+ s"$catalog.$database.$table"
+ case _ =>
+ plan.name
}
}
- private def getMethod[T](o: Any, name: String): T = {
- Try {
- val method = o.getClass.getDeclaredMethod(name)
- method.invoke(o)
- } match {
- case Success(value) => value.asInstanceOf[T]
- case Failure(e) =>
- val candidates = o.getClass.getDeclaredMethods.map(_.getName).mkString("[", ",", "]")
- throw new RuntimeException(s"$name not in $candidates", e)
+ private def getV1TableName(qualifiedName: String): String = {
+ qualifiedName.split("\\.") match {
+ case Array(database, table) =>
+ Seq(LineageConf.DEFAULT_CATALOG, database, table).filter(_.nonEmpty).mkString(".")
+ case _ => qualifiedName
}
}
-
}
case class SparkSQLLineageParseHelper(sparkSession: SparkSession) extends LineageParser
diff --git a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/spark/kyuubi/lineage/LineageConf.scala b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/spark/kyuubi/lineage/LineageConf.scala
index 6fb5399c059..e264b1f3596 100644
--- a/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/spark/kyuubi/lineage/LineageConf.scala
+++ b/extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/spark/kyuubi/lineage/LineageConf.scala
@@ -18,6 +18,7 @@
package org.apache.spark.kyuubi.lineage
import org.apache.spark.internal.config.ConfigBuilder
+import org.apache.spark.sql.internal.SQLConf
import org.apache.kyuubi.plugin.lineage.LineageDispatcherType
@@ -35,6 +36,7 @@ object LineageConf {
"`org.apache.kyuubi.plugin.lineage.LineageDispatcher` for dispatching lineage events.
" +
"
SPARK_EVENT: send lineage event to spark event bus
" +
"
KYUUBI_EVENT: send lineage event to kyuubi event bus
" +
+ "
ATLAS: send lineage to apache atlas
" +
"
")
.version("1.8.0")
.stringConf
@@ -44,4 +46,6 @@ object LineageConf {
"Unsupported lineage dispatchers")
.createWithDefault(Seq(LineageDispatcherType.SPARK_EVENT.toString))
+ val DEFAULT_CATALOG: String = SQLConf.get.getConf(SQLConf.DEFAULT_CATALOG)
+
}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/test/resources/atlas-application.properties b/extensions/spark/kyuubi-spark-lineage/src/test/resources/atlas-application.properties
new file mode 100644
index 00000000000..e6dc52f98f1
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-lineage/src/test/resources/atlas-application.properties
@@ -0,0 +1,18 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+atlas.cluster.name=test
diff --git a/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasLineageDispatcherSuite.scala b/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasLineageDispatcherSuite.scala
new file mode 100644
index 00000000000..8e8d18f216e
--- /dev/null
+++ b/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasLineageDispatcherSuite.scala
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.plugin.lineage.dispatcher.atlas
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.atlas.model.instance.{AtlasEntity, AtlasObjectId}
+import org.apache.commons.lang3.StringUtils
+import org.apache.spark.SparkConf
+import org.apache.spark.kyuubi.lineage.LineageConf.{DEFAULT_CATALOG, DISPATCHERS, SKIP_PARSING_PERMANENT_VIEW_ENABLED}
+import org.apache.spark.kyuubi.lineage.SparkContextHelper
+import org.apache.spark.sql.SparkListenerExtensionTest
+import org.scalatest.concurrent.PatienceConfiguration.Timeout
+import org.scalatest.time.SpanSugar._
+
+import org.apache.kyuubi.KyuubiFunSuite
+import org.apache.kyuubi.plugin.lineage.Lineage
+import org.apache.kyuubi.plugin.lineage.dispatcher.atlas.AtlasEntityHelper.{buildColumnQualifiedName, buildTableQualifiedName, COLUMN_LINEAGE_TYPE, PROCESS_TYPE}
+import org.apache.kyuubi.plugin.lineage.helper.SparkListenerHelper.SPARK_RUNTIME_VERSION
+
+class AtlasLineageDispatcherSuite extends KyuubiFunSuite with SparkListenerExtensionTest {
+ val catalogName =
+ if (SPARK_RUNTIME_VERSION <= "3.1") "org.apache.spark.sql.connector.InMemoryTableCatalog"
+ else "org.apache.spark.sql.connector.catalog.InMemoryTableCatalog"
+
+ override protected val catalogImpl: String = "hive"
+
+ override def sparkConf(): SparkConf = {
+ super.sparkConf()
+ .set("spark.sql.catalog.v2_catalog", catalogName)
+ .set(
+ "spark.sql.queryExecutionListeners",
+ "org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener")
+ .set(DISPATCHERS.key, "ATLAS")
+ .set(SKIP_PARSING_PERMANENT_VIEW_ENABLED.key, "true")
+ }
+
+ override def afterAll(): Unit = {
+ spark.stop()
+ super.afterAll()
+ }
+
+ test("altas lineage capture: insert into select sql") {
+ val mockAtlasClient = new MockAtlasClient()
+ AtlasClient.setClient(mockAtlasClient)
+
+ withTable("test_table0") { _ =>
+ spark.sql("create table test_table0(a string, b int, c int)")
+ spark.sql("create table test_table1(a string, d int)")
+ spark.sql("insert into test_table1 select a, b + c as d from test_table0").collect()
+ val expected = Lineage(
+ List(s"$DEFAULT_CATALOG.default.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.test_table1"),
+ List(
+ (
+ s"$DEFAULT_CATALOG.default.test_table1.a",
+ Set(s"$DEFAULT_CATALOG.default.test_table0.a")),
+ (
+ s"$DEFAULT_CATALOG.default.test_table1.d",
+ Set(
+ s"$DEFAULT_CATALOG.default.test_table0.b",
+ s"$DEFAULT_CATALOG.default.test_table0.c"))))
+ eventually(Timeout(5.seconds)) {
+ assert(mockAtlasClient.getEntities != null && mockAtlasClient.getEntities.nonEmpty)
+ }
+ checkAtlasProcessEntity(mockAtlasClient.getEntities.head, expected)
+ checkAtlasColumnLineageEntities(
+ mockAtlasClient.getEntities.head,
+ mockAtlasClient.getEntities.tail,
+ expected)
+ }
+
+ }
+
+ def checkAtlasProcessEntity(entity: AtlasEntity, expected: Lineage): Unit = {
+ assert(entity.getTypeName == PROCESS_TYPE)
+
+ val appId = SparkContextHelper.globalSparkContext.applicationId
+ assert(entity.getAttribute("qualifiedName") == appId)
+ assert(entity.getAttribute("name")
+ == s"${SparkContextHelper.globalSparkContext.appName} $appId")
+ assert(StringUtils.isNotBlank(entity.getAttribute("currUser").asInstanceOf[String]))
+ assert(entity.getAttribute("executionId") != null)
+ assert(StringUtils.isNotBlank(entity.getAttribute("details").asInstanceOf[String]))
+ assert(StringUtils.isNotBlank(entity.getAttribute("sparkPlanDescription").asInstanceOf[String]))
+
+ val inputs = entity.getRelationshipAttribute("inputs")
+ .asInstanceOf[util.Collection[AtlasObjectId]].asScala.map(getQualifiedName)
+ val outputs = entity.getRelationshipAttribute("outputs")
+ .asInstanceOf[util.Collection[AtlasObjectId]].asScala.map(getQualifiedName)
+ assertResult(expected.inputTables
+ .flatMap(buildTableQualifiedName(_).toSeq))(inputs)
+ assertResult(expected.outputTables
+ .flatMap(buildTableQualifiedName(_).toSeq))(outputs)
+ }
+
+ def checkAtlasColumnLineageEntities(
+ processEntity: AtlasEntity,
+ entities: Seq[AtlasEntity],
+ expected: Lineage): Unit = {
+ assert(entities.size == expected.columnLineage.size)
+
+ entities.zip(expected.columnLineage).foreach {
+ case (entity, expectedLineage) =>
+ assert(entity.getTypeName == COLUMN_LINEAGE_TYPE)
+ val expectedQualifiedName =
+ s"${processEntity.getAttribute("qualifiedName")}:" +
+ s"${buildColumnQualifiedName(expectedLineage.column).get}"
+ assert(entity.getAttribute("qualifiedName") == expectedQualifiedName)
+ assert(entity.getAttribute("name") == expectedQualifiedName)
+
+ val inputs = entity.getRelationshipAttribute("inputs")
+ .asInstanceOf[util.Collection[AtlasObjectId]].asScala.map(getQualifiedName)
+ assertResult(expectedLineage.originalColumns
+ .flatMap(buildColumnQualifiedName(_).toSet))(inputs.toSet)
+
+ val outputs = entity.getRelationshipAttribute("outputs")
+ .asInstanceOf[util.Collection[AtlasObjectId]].asScala.map(getQualifiedName)
+ assert(outputs.size == 1)
+ assert(buildColumnQualifiedName(expectedLineage.column).toSeq.head == outputs.head)
+
+ assert(getQualifiedName(entity.getRelationshipAttribute("process").asInstanceOf[
+ AtlasObjectId]) == processEntity.getAttribute("qualifiedName"))
+ }
+ }
+
+ // Pre-set cluster name for testing in `test/resources/atlas-application.properties`
+ private val cluster = "test"
+
+ def getQualifiedName(objId: AtlasObjectId): String = {
+ objId.getUniqueAttributes.get("qualifiedName").asInstanceOf[String]
+ }
+
+ class MockAtlasClient() extends AtlasClient {
+ private var _entities: Seq[AtlasEntity] = _
+
+ override def send(entities: Seq[AtlasEntity]): Unit = {
+ _entities = entities
+ }
+
+ def getEntities: Seq[AtlasEntity] = _entities
+
+ override def close(): Unit = {}
+ }
+}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/events/OperationLineageEventSuite.scala b/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/events/OperationLineageEventSuite.scala
index 67e94ad0b79..378eb3bb460 100644
--- a/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/events/OperationLineageEventSuite.scala
+++ b/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/events/OperationLineageEventSuite.scala
@@ -19,8 +19,6 @@ package org.apache.kyuubi.plugin.lineage.events
import java.util.concurrent.{CountDownLatch, TimeUnit}
-import scala.collection.immutable.List
-
import org.apache.spark.SparkConf
import org.apache.spark.kyuubi.lineage.LineageConf._
import org.apache.spark.scheduler.{SparkListener, SparkListenerEvent}
@@ -30,12 +28,12 @@ import org.apache.kyuubi.KyuubiFunSuite
import org.apache.kyuubi.events.EventBus
import org.apache.kyuubi.plugin.lineage.Lineage
import org.apache.kyuubi.plugin.lineage.dispatcher.{OperationLineageKyuubiEvent, OperationLineageSparkEvent}
-import org.apache.kyuubi.plugin.lineage.helper.SparkListenerHelper.isSparkVersionAtMost
+import org.apache.kyuubi.plugin.lineage.helper.SparkListenerHelper.SPARK_RUNTIME_VERSION
class OperationLineageEventSuite extends KyuubiFunSuite with SparkListenerExtensionTest {
val catalogName =
- if (isSparkVersionAtMost("3.1")) "org.apache.spark.sql.connector.InMemoryTableCatalog"
+ if (SPARK_RUNTIME_VERSION <= "3.1") "org.apache.spark.sql.connector.InMemoryTableCatalog"
else "org.apache.spark.sql.connector.catalog.InMemoryTableCatalog"
override protected val catalogImpl: String = "hive"
@@ -82,11 +80,11 @@ class OperationLineageEventSuite extends KyuubiFunSuite with SparkListenerExtens
spark.sql("create table test_table0(a string, b string)")
spark.sql("select a as col0, b as col1 from test_table0").collect()
val expected = Lineage(
- List("default.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.test_table0"),
List(),
List(
- ("col0", Set("default.test_table0.a")),
- ("col1", Set("default.test_table0.b"))))
+ ("col0", Set(s"$DEFAULT_CATALOG.default.test_table0.a")),
+ ("col1", Set(s"$DEFAULT_CATALOG.default.test_table0.b"))))
countDownLatch.await(20, TimeUnit.SECONDS)
assert(actualSparkEventLineage == expected)
assert(actualKyuubiEventLineage == expected)
@@ -97,11 +95,11 @@ class OperationLineageEventSuite extends KyuubiFunSuite with SparkListenerExtens
val countDownLatch = new CountDownLatch(1)
var executionId: Long = -1
val expected = Lineage(
- List("default.table1", "default.table0"),
+ List(s"$DEFAULT_CATALOG.default.table1", s"$DEFAULT_CATALOG.default.table0"),
List(),
List(
- ("aa", Set("default.table1.a")),
- ("bb", Set("default.table0.b"))))
+ ("aa", Set(s"$DEFAULT_CATALOG.default.table1.a")),
+ ("bb", Set(s"$DEFAULT_CATALOG.default.table0.b"))))
spark.sparkContext.addSparkListener(new SparkListener {
override def onOtherEvent(event: SparkListenerEvent): Unit = {
@@ -163,11 +161,11 @@ class OperationLineageEventSuite extends KyuubiFunSuite with SparkListenerExtens
s" where a in ('HELLO') and c = 'HELLO'").collect()
val expected = Lineage(
- List("default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
List(),
List(
- ("k", Set("default.t2.a")),
- ("b", Set("default.t2.b"))))
+ ("k", Set(s"$DEFAULT_CATALOG.default.t2.a")),
+ ("b", Set(s"$DEFAULT_CATALOG.default.t2.b"))))
countDownLatch.await(20, TimeUnit.SECONDS)
assert(actual == expected)
}
diff --git a/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala b/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
index 96003f051f5..3c19163db42 100644
--- a/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
+++ b/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
@@ -17,7 +17,6 @@
package org.apache.kyuubi.plugin.lineage.helper
-import scala.collection.immutable.List
import scala.reflect.io.File
import org.apache.spark.SparkConf
@@ -30,15 +29,16 @@ import org.apache.spark.sql.types.{IntegerType, StringType, StructType}
import org.apache.kyuubi.KyuubiFunSuite
import org.apache.kyuubi.plugin.lineage.Lineage
-import org.apache.kyuubi.plugin.lineage.helper.SparkListenerHelper.isSparkVersionAtMost
+import org.apache.kyuubi.plugin.lineage.helper.SparkListenerHelper.SPARK_RUNTIME_VERSION
class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
with SparkListenerExtensionTest {
val catalogName =
- if (isSparkVersionAtMost("3.1")) "org.apache.spark.sql.connector.InMemoryTableCatalog"
+ if (SPARK_RUNTIME_VERSION <= "3.1") "org.apache.spark.sql.connector.InMemoryTableCatalog"
else "org.apache.spark.sql.connector.catalog.InMemoryTableCatalog"
+ val DEFAULT_CATALOG = LineageConf.DEFAULT_CATALOG
override protected val catalogImpl: String = "hive"
override def sparkConf(): SparkConf = {
@@ -75,22 +75,28 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
withView("alterviewascommand", "alterviewascommand1") { _ =>
spark.sql("create view alterviewascommand as select key from test_db0.test_table0")
val ret0 =
- exectractLineage("alter view alterviewascommand as select key from test_db0.test_table0")
+ extractLineage("alter view alterviewascommand as select key from test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
- List("default.alterviewascommand"),
- List(("default.alterviewascommand.key", Set("test_db0.test_table0.key")))))
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.alterviewascommand"),
+ List((
+ s"$DEFAULT_CATALOG.default.alterviewascommand.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")))))
spark.sql("create view alterviewascommand1 as select * from test_db0.test_table0")
val ret1 =
- exectractLineage("alter view alterviewascommand1 as select * from test_db0.test_table0")
+ extractLineage("alter view alterviewascommand1 as select * from test_db0.test_table0")
assert(ret1 == Lineage(
- List("test_db0.test_table0"),
- List("default.alterviewascommand1"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.alterviewascommand1"),
List(
- ("default.alterviewascommand1.key", Set("test_db0.test_table0.key")),
- ("default.alterviewascommand1.value", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.alterviewascommand1.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.alterviewascommand1.value",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
}
}
@@ -102,16 +108,16 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
ddls.split("\n").filter(_.nonEmpty).foreach(spark.sql(_).collect())
withView("test_view") { _ =>
- val result = exectractLineage(
+ val result = extractLineage(
"create view test_view(a, b, c) as" +
" select col1 as a, col2 as b, col3 as c from v2_catalog.db.tbb")
assert(result == Lineage(
List("v2_catalog.db.tbb"),
- List("default.test_view"),
+ List(s"$DEFAULT_CATALOG.default.test_view"),
List(
- ("default.test_view.a", Set("v2_catalog.db.tbb.col1")),
- ("default.test_view.b", Set("v2_catalog.db.tbb.col2")),
- ("default.test_view.c", Set("v2_catalog.db.tbb.col3")))))
+ (s"$DEFAULT_CATALOG.default.test_view.a", Set("v2_catalog.db.tbb.col1")),
+ (s"$DEFAULT_CATALOG.default.test_view.b", Set("v2_catalog.db.tbb.col2")),
+ (s"$DEFAULT_CATALOG.default.test_view.c", Set("v2_catalog.db.tbb.col3")))))
}
}
@@ -123,36 +129,36 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
ddls.split("\n").filter(_.nonEmpty).foreach(spark.sql(_).collect())
withTable("v2_catalog.db.tb0") { _ =>
val ret0 =
- exectractLineage(
+ extractLineage(
s"insert into table v2_catalog.db.tb0 " +
s"select key as col1, value as col2 from test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List("v2_catalog.db.tb0"),
List(
- ("v2_catalog.db.tb0.col1", Set("test_db0.test_table0.key")),
- ("v2_catalog.db.tb0.col2", Set("test_db0.test_table0.value")))))
+ ("v2_catalog.db.tb0.col1", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ ("v2_catalog.db.tb0.col2", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
val ret1 =
- exectractLineage(
+ extractLineage(
s"insert overwrite table v2_catalog.db.tb0 partition(col2) " +
s"select key as col1, value as col2 from test_db0.test_table0")
assert(ret1 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List("v2_catalog.db.tb0"),
List(
- ("v2_catalog.db.tb0.col1", Set("test_db0.test_table0.key")),
- ("v2_catalog.db.tb0.col2", Set("test_db0.test_table0.value")))))
+ ("v2_catalog.db.tb0.col1", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ ("v2_catalog.db.tb0.col2", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
val ret2 =
- exectractLineage(
+ extractLineage(
s"insert overwrite table v2_catalog.db.tb0 partition(col2 = 'bb') " +
s"select key as col1 from test_db0.test_table0")
assert(ret2 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List("v2_catalog.db.tb0"),
List(
- ("v2_catalog.db.tb0.col1", Set("test_db0.test_table0.key")),
+ ("v2_catalog.db.tb0.col1", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
("v2_catalog.db.tb0.col2", Set()))))
}
}
@@ -166,7 +172,7 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
|""".stripMargin
ddls.split("\n").filter(_.nonEmpty).foreach(spark.sql(_).collect())
withTable("v2_catalog.db.target_t", "v2_catalog.db.source_t") { _ =>
- val ret0 = exectractLineageWithoutExecuting("MERGE INTO v2_catalog.db.target_t AS target " +
+ val ret0 = extractLineageWithoutExecuting("MERGE INTO v2_catalog.db.target_t AS target " +
"USING v2_catalog.db.source_t AS source " +
"ON target.id = source.id " +
"WHEN MATCHED THEN " +
@@ -181,7 +187,7 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
("v2_catalog.db.target_t.name", Set("v2_catalog.db.source_t.name")),
("v2_catalog.db.target_t.price", Set("v2_catalog.db.source_t.price")))))
- val ret1 = exectractLineageWithoutExecuting("MERGE INTO v2_catalog.db.target_t AS target " +
+ val ret1 = extractLineageWithoutExecuting("MERGE INTO v2_catalog.db.target_t AS target " +
"USING v2_catalog.db.source_t AS source " +
"ON target.id = source.id " +
"WHEN MATCHED THEN " +
@@ -196,7 +202,7 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
("v2_catalog.db.target_t.name", Set("v2_catalog.db.source_t.name")),
("v2_catalog.db.target_t.price", Set("v2_catalog.db.source_t.price")))))
- val ret2 = exectractLineageWithoutExecuting("MERGE INTO v2_catalog.db.target_t AS target " +
+ val ret2 = extractLineageWithoutExecuting("MERGE INTO v2_catalog.db.target_t AS target " +
"USING (select a.id, a.name, b.price " +
"from v2_catalog.db.source_t a join v2_catalog.db.pivot_t b) AS source " +
"ON target.id = source.id " +
@@ -218,32 +224,44 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
test("columns lineage extract - CreateViewCommand") {
withView("createviewcommand", "createviewcommand1", "createviewcommand2") { _ =>
- val ret0 = exectractLineage(
+ val ret0 = extractLineage(
"create view createviewcommand(a, b) as select key, value from test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
- List("default.createviewcommand"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.createviewcommand"),
List(
- ("default.createviewcommand.a", Set("test_db0.test_table0.key")),
- ("default.createviewcommand.b", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.createviewcommand.a",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.createviewcommand.b",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
- val ret1 = exectractLineage(
+ val ret1 = extractLineage(
"create view createviewcommand1 as select key, value from test_db0.test_table0")
assert(ret1 == Lineage(
- List("test_db0.test_table0"),
- List("default.createviewcommand1"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.createviewcommand1"),
List(
- ("default.createviewcommand1.key", Set("test_db0.test_table0.key")),
- ("default.createviewcommand1.value", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.createviewcommand1.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.createviewcommand1.value",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
- val ret2 = exectractLineage(
+ val ret2 = extractLineage(
"create view createviewcommand2 as select * from test_db0.test_table0")
assert(ret2 == Lineage(
- List("test_db0.test_table0"),
- List("default.createviewcommand2"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.createviewcommand2"),
List(
- ("default.createviewcommand2.key", Set("test_db0.test_table0.key")),
- ("default.createviewcommand2.value", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.createviewcommand2.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.createviewcommand2.value",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
}
}
@@ -251,67 +269,81 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
withTable("createdatasourcetableasselectcommand", "createdatasourcetableasselectcommand1") {
_ =>
val ret0 =
- exectractLineage("create table createdatasourcetableasselectcommand using parquet" +
+ extractLineage("create table createdatasourcetableasselectcommand using parquet" +
" AS SELECT key, value FROM test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
- List("default.createdatasourcetableasselectcommand"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.createdatasourcetableasselectcommand"),
List(
- ("default.createdatasourcetableasselectcommand.key", Set("test_db0.test_table0.key")),
(
- "default.createdatasourcetableasselectcommand.value",
- Set("test_db0.test_table0.value")))))
+ s"$DEFAULT_CATALOG.default.createdatasourcetableasselectcommand.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.createdatasourcetableasselectcommand.value",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
val ret1 =
- exectractLineage("create table createdatasourcetableasselectcommand1 using parquet" +
+ extractLineage("create table createdatasourcetableasselectcommand1 using parquet" +
" AS SELECT * FROM test_db0.test_table0")
assert(ret1 == Lineage(
- List("test_db0.test_table0"),
- List("default.createdatasourcetableasselectcommand1"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.createdatasourcetableasselectcommand1"),
List(
- ("default.createdatasourcetableasselectcommand1.key", Set("test_db0.test_table0.key")),
(
- "default.createdatasourcetableasselectcommand1.value",
- Set("test_db0.test_table0.value")))))
+ s"$DEFAULT_CATALOG.default.createdatasourcetableasselectcommand1.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.createdatasourcetableasselectcommand1.value",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
}
}
test("columns lineage extract - CreateHiveTableAsSelectCommand") {
withTable("createhivetableasselectcommand", "createhivetableasselectcommand1") { _ =>
- val ret0 = exectractLineage("create table createhivetableasselectcommand using hive" +
+ val ret0 = extractLineage("create table createhivetableasselectcommand using hive" +
" as select key, value from test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
- List("default.createhivetableasselectcommand"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.createhivetableasselectcommand"),
List(
- ("default.createhivetableasselectcommand.key", Set("test_db0.test_table0.key")),
- ("default.createhivetableasselectcommand.value", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.createhivetableasselectcommand.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.createhivetableasselectcommand.value",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
- val ret1 = exectractLineage("create table createhivetableasselectcommand1 using hive" +
+ val ret1 = extractLineage("create table createhivetableasselectcommand1 using hive" +
" as select * from test_db0.test_table0")
assert(ret1 == Lineage(
- List("test_db0.test_table0"),
- List("default.createhivetableasselectcommand1"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.createhivetableasselectcommand1"),
List(
- ("default.createhivetableasselectcommand1.key", Set("test_db0.test_table0.key")),
- ("default.createhivetableasselectcommand1.value", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.createhivetableasselectcommand1.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.createhivetableasselectcommand1.value",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
}
}
test("columns lineage extract - OptimizedCreateHiveTableAsSelectCommand") {
withTable("optimizedcreatehivetableasselectcommand") { _ =>
val ret =
- exectractLineage(
+ extractLineage(
"create table optimizedcreatehivetableasselectcommand stored as parquet " +
"as select * from test_db0.test_table0")
assert(ret == Lineage(
- List("test_db0.test_table0"),
- List("default.optimizedcreatehivetableasselectcommand"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.optimizedcreatehivetableasselectcommand"),
List(
- ("default.optimizedcreatehivetableasselectcommand.key", Set("test_db0.test_table0.key")),
(
- "default.optimizedcreatehivetableasselectcommand.value",
- Set("test_db0.test_table0.value")))))
+ s"$DEFAULT_CATALOG.default.optimizedcreatehivetableasselectcommand.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.optimizedcreatehivetableasselectcommand.value",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
}
}
@@ -319,27 +351,31 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
withTable(
"v2_catalog.db.createhivetableasselectcommand",
"v2_catalog.db.createhivetableasselectcommand1") { _ =>
- val ret0 = exectractLineage("create table v2_catalog.db.createhivetableasselectcommand" +
+ val ret0 = extractLineage("create table v2_catalog.db.createhivetableasselectcommand" +
" as select key, value from test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List("v2_catalog.db.createhivetableasselectcommand"),
List(
- ("v2_catalog.db.createhivetableasselectcommand.key", Set("test_db0.test_table0.key")),
+ (
+ "v2_catalog.db.createhivetableasselectcommand.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
(
"v2_catalog.db.createhivetableasselectcommand.value",
- Set("test_db0.test_table0.value")))))
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
- val ret1 = exectractLineage("create table v2_catalog.db.createhivetableasselectcommand1" +
+ val ret1 = extractLineage("create table v2_catalog.db.createhivetableasselectcommand1" +
" as select * from test_db0.test_table0")
assert(ret1 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List("v2_catalog.db.createhivetableasselectcommand1"),
List(
- ("v2_catalog.db.createhivetableasselectcommand1.key", Set("test_db0.test_table0.key")),
+ (
+ "v2_catalog.db.createhivetableasselectcommand1.key",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
(
"v2_catalog.db.createhivetableasselectcommand1.value",
- Set("test_db0.test_table0.value")))))
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
}
}
@@ -364,36 +400,48 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
spark.sessionState.catalog.createTable(newTable, ignoreIfExists = false)
val ret0 =
- exectractLineage(
+ extractLineage(
s"insert into table $tableName select key, value from test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
- List("default.insertintodatasourcecommand"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.insertintodatasourcecommand"),
List(
- ("default.insertintodatasourcecommand.a", Set("test_db0.test_table0.key")),
- ("default.insertintodatasourcecommand.b", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.insertintodatasourcecommand.a",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.insertintodatasourcecommand.b",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
val ret1 =
- exectractLineage(
+ extractLineage(
s"insert into table $tableName select * from test_db0.test_table0")
assert(ret1 == Lineage(
- List("test_db0.test_table0"),
- List("default.insertintodatasourcecommand"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.insertintodatasourcecommand"),
List(
- ("default.insertintodatasourcecommand.a", Set("test_db0.test_table0.key")),
- ("default.insertintodatasourcecommand.b", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.insertintodatasourcecommand.a",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.insertintodatasourcecommand.b",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
val ret2 =
- exectractLineage(
+ extractLineage(
s"insert into table $tableName " +
s"select (select key from test_db0.test_table1 limit 1) + 1 as aa, " +
s"value as bb from test_db0.test_table0")
assert(ret2 == Lineage(
- List("test_db0.test_table1", "test_db0.test_table0"),
- List("default.insertintodatasourcecommand"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table1", s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.insertintodatasourcecommand"),
List(
- ("default.insertintodatasourcecommand.a", Set("test_db0.test_table1.key")),
- ("default.insertintodatasourcecommand.b", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.insertintodatasourcecommand.a",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table1.key")),
+ (
+ s"$DEFAULT_CATALOG.default.insertintodatasourcecommand.b",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
}
}
@@ -403,15 +451,19 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
withTable(tableName) { _ =>
spark.sql(s"CREATE TABLE $tableName (a int, b string) USING parquet")
val ret0 =
- exectractLineage(
+ extractLineage(
s"insert into table $tableName select key, value from test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
- List("default.insertintohadoopfsrelationcommand"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.insertintohadoopfsrelationcommand"),
List(
- ("default.insertintohadoopfsrelationcommand.a", Set("test_db0.test_table0.key")),
- ("default.insertintohadoopfsrelationcommand.b", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.insertintohadoopfsrelationcommand.a",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.insertintohadoopfsrelationcommand.b",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
}
}
@@ -419,33 +471,33 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
test("columns lineage extract - InsertIntoDatasourceDirCommand") {
val tableDirectory = getClass.getResource("/").getPath + "table_directory"
val directory = File(tableDirectory).createDirectory()
- val ret0 = exectractLineage(s"""
- |INSERT OVERWRITE DIRECTORY '$directory.path'
- |USING parquet
- |SELECT * FROM test_db0.test_table_part0""".stripMargin)
+ val ret0 = extractLineage(s"""
+ |INSERT OVERWRITE DIRECTORY '$directory.path'
+ |USING parquet
+ |SELECT * FROM test_db0.test_table_part0""".stripMargin)
assert(ret0 == Lineage(
- List("test_db0.test_table_part0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table_part0"),
List(s"""`$directory.path`"""),
List(
- (s"""`$directory.path`.key""", Set("test_db0.test_table_part0.key")),
- (s"""`$directory.path`.value""", Set("test_db0.test_table_part0.value")),
- (s"""`$directory.path`.pid""", Set("test_db0.test_table_part0.pid")))))
+ (s"""`$directory.path`.key""", Set(s"$DEFAULT_CATALOG.test_db0.test_table_part0.key")),
+ (s"""`$directory.path`.value""", Set(s"$DEFAULT_CATALOG.test_db0.test_table_part0.value")),
+ (s"""`$directory.path`.pid""", Set(s"$DEFAULT_CATALOG.test_db0.test_table_part0.pid")))))
}
test("columns lineage extract - InsertIntoHiveDirCommand") {
val tableDirectory = getClass.getResource("/").getPath + "table_directory"
val directory = File(tableDirectory).createDirectory()
- val ret0 = exectractLineage(s"""
- |INSERT OVERWRITE DIRECTORY '$directory.path'
- |USING parquet
- |SELECT * FROM test_db0.test_table_part0""".stripMargin)
+ val ret0 = extractLineage(s"""
+ |INSERT OVERWRITE DIRECTORY '$directory.path'
+ |USING parquet
+ |SELECT * FROM test_db0.test_table_part0""".stripMargin)
assert(ret0 == Lineage(
- List("test_db0.test_table_part0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table_part0"),
List(s"""`$directory.path`"""),
List(
- (s"""`$directory.path`.key""", Set("test_db0.test_table_part0.key")),
- (s"""`$directory.path`.value""", Set("test_db0.test_table_part0.value")),
- (s"""`$directory.path`.pid""", Set("test_db0.test_table_part0.pid")))))
+ (s"""`$directory.path`.key""", Set(s"$DEFAULT_CATALOG.test_db0.test_table_part0.key")),
+ (s"""`$directory.path`.value""", Set(s"$DEFAULT_CATALOG.test_db0.test_table_part0.value")),
+ (s"""`$directory.path`.pid""", Set(s"$DEFAULT_CATALOG.test_db0.test_table_part0.pid")))))
}
test("columns lineage extract - InsertIntoHiveTable") {
@@ -453,41 +505,45 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
withTable(tableName) { _ =>
spark.sql(s"CREATE TABLE $tableName (a int, b string) USING hive")
val ret0 =
- exectractLineage(
+ extractLineage(
s"insert into table $tableName select * from test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
- List(s"default.$tableName"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.default.$tableName"),
List(
- (s"default.$tableName.a", Set("test_db0.test_table0.key")),
- (s"default.$tableName.b", Set("test_db0.test_table0.value")))))
+ (
+ s"$DEFAULT_CATALOG.default.$tableName.a",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ (
+ s"$DEFAULT_CATALOG.default.$tableName.b",
+ Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
}
}
test("columns lineage extract - logical relation sql") {
- val ret0 = exectractLineage("select key, value from test_db0.test_table0")
+ val ret0 = extractLineage("select key, value from test_db0.test_table0")
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List(),
List(
- ("key", Set("test_db0.test_table0.key")),
- ("value", Set("test_db0.test_table0.value")))))
+ ("key", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ ("value", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))
- val ret1 = exectractLineage("select * from test_db0.test_table_part0")
+ val ret1 = extractLineage("select * from test_db0.test_table_part0")
assert(ret1 == Lineage(
- List("test_db0.test_table_part0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table_part0"),
List(),
List(
- ("key", Set("test_db0.test_table_part0.key")),
- ("value", Set("test_db0.test_table_part0.value")),
- ("pid", Set("test_db0.test_table_part0.pid")))))
+ ("key", Set(s"$DEFAULT_CATALOG.test_db0.test_table_part0.key")),
+ ("value", Set(s"$DEFAULT_CATALOG.test_db0.test_table_part0.value")),
+ ("pid", Set(s"$DEFAULT_CATALOG.test_db0.test_table_part0.pid")))))
}
test("columns lineage extract - not generate lineage sql") {
- val ret0 = exectractLineage("create table test_table1(a string, b string, c string)")
+ val ret0 = extractLineage("create table test_table1(a string, b string, c string)")
assert(ret0 == Lineage(List[String](), List[String](), List[(String, Set[String])]()))
}
@@ -500,14 +556,14 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
ddls.split("\n").filter(_.nonEmpty).foreach(spark.sql(_).collect())
withTable("v2_catalog.db.tb") { _ =>
val sql0 = "select col1 from v2_catalog.db.tb"
- val ret0 = exectractLineage(sql0)
+ val ret0 = extractLineage(sql0)
assert(ret0 == Lineage(
List("v2_catalog.db.tb"),
List(),
List("col1" -> Set("v2_catalog.db.tb.col1"))))
val sql1 = "select col1, hash(hash(col1)) as col2 from v2_catalog.db.tb"
- val ret1 = exectractLineage(sql1)
+ val ret1 = extractLineage(sql1)
assert(ret1 == Lineage(
List("v2_catalog.db.tb"),
List(),
@@ -515,7 +571,7 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
val sql2 =
"select col1, case col1 when '1' then 's1' else col1 end col2 from v2_catalog.db.tb"
- val ret2 = exectractLineage(sql2)
+ val ret2 = extractLineage(sql2)
assert(ret2 == Lineage(
List("v2_catalog.db.tb"),
List(),
@@ -524,7 +580,7 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
val sql3 =
"select col1 as col2, 'col2' as col2, 'col2', first(col3) as col2 " +
"from v2_catalog.db.tb group by col1"
- val ret3 = exectractLineage(sql3)
+ val ret3 = extractLineage(sql3)
assert(ret3 == Lineage(
List("v2_catalog.db.tb"),
List[String](),
@@ -537,7 +593,7 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
val sql4 =
"select col1 as col2, sum(hash(col1) + hash(hash(col1))) " +
"from v2_catalog.db.tb group by col1"
- val ret4 = exectractLineage(sql4)
+ val ret4 = extractLineage(sql4)
assert(ret4 == Lineage(
List("v2_catalog.db.tb"),
List(),
@@ -553,7 +609,7 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
| on t1.col2 = t2.col3
| group by 1
|""".stripMargin
- val ret5 = exectractLineage(sql5)
+ val ret5 = extractLineage(sql5)
assert(ret5 == Lineage(
List("v2_catalog.db.tb"),
List(),
@@ -579,26 +635,26 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
|select tmp0_0 as a0, tmp1_0 as a1 from tmp0 join tmp1 where tmp1_0 = tmp0_0
|""".stripMargin
val sql0ExpectResult = Lineage(
- List("default.tmp0", "default.tmp1"),
+ List(s"$DEFAULT_CATALOG.default.tmp0", s"$DEFAULT_CATALOG.default.tmp1"),
List(),
List(
- "a0" -> Set("default.tmp0.tmp0_0"),
- "a1" -> Set("default.tmp1.tmp1_0")))
+ "a0" -> Set(s"$DEFAULT_CATALOG.default.tmp0.tmp0_0"),
+ "a1" -> Set(s"$DEFAULT_CATALOG.default.tmp1.tmp1_0")))
val sql1 =
"""
|select count(tmp1_0) as cnt, tmp1_1 from tmp1 group by tmp1_1
|""".stripMargin
val sql1ExpectResult = Lineage(
- List("default.tmp1"),
+ List(s"$DEFAULT_CATALOG.default.tmp1"),
List(),
List(
- "cnt" -> Set("default.tmp1.tmp1_0"),
- "tmp1_1" -> Set("default.tmp1.tmp1_1")))
+ "cnt" -> Set(s"$DEFAULT_CATALOG.default.tmp1.tmp1_0"),
+ "tmp1_1" -> Set(s"$DEFAULT_CATALOG.default.tmp1.tmp1_1")))
- val ret0 = exectractLineage(sql0)
+ val ret0 = extractLineage(sql0)
assert(ret0 == sql0ExpectResult)
- val ret1 = exectractLineage(sql1)
+ val ret1 = extractLineage(sql1)
assert(ret1 == sql1ExpectResult)
}
}
@@ -658,17 +714,17 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
|FROM goods_cat_new
|LIMIT 10""".stripMargin
- val ret0 = exectractLineage(sql0)
+ val ret0 = extractLineage(sql0)
assert(ret0 == Lineage(
List(
- "test_db.goods_detail0",
+ s"$DEFAULT_CATALOG.test_db.goods_detail0",
"v2_catalog.test_db_v2.goods_detail1",
"v2_catalog.test_db_v2.mall_icon_schedule",
"v2_catalog.test_db_v2.mall_icon"),
List(),
List(
- ("goods_id", Set("test_db.goods_detail0.goods_id")),
- ("cate_grory", Set("test_db.goods_detail0.cat_id")),
+ ("goods_id", Set(s"$DEFAULT_CATALOG.test_db.goods_detail0.goods_id")),
+ ("cate_grory", Set(s"$DEFAULT_CATALOG.test_db.goods_detail0.cat_id")),
("cat_id", Set("v2_catalog.test_db_v2.goods_detail1.cat_id")),
("product_id", Set("v2_catalog.test_db_v2.goods_detail1.product_id")),
("start_time", Set("v2_catalog.test_db_v2.mall_icon_schedule.start_time")),
@@ -692,7 +748,7 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
|on t1.col1 = t2.col1
|""".stripMargin
- val ret0 = exectractLineage(sql0)
+ val ret0 = extractLineage(sql0)
assert(
ret0 == Lineage(
List("v2_catalog.db.tb1", "v2_catalog.db.tb2"),
@@ -727,14 +783,26 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
|select a, b, c as c from test_db.test_table1
|) a
|""".stripMargin
- val ret0 = exectractLineage(sql0)
+ val ret0 = extractLineage(sql0)
assert(ret0 == Lineage(
- List("test_db.test_table0", "test_db.test_table1"),
+ List(s"$DEFAULT_CATALOG.test_db.test_table0", s"$DEFAULT_CATALOG.test_db.test_table1"),
List(),
List(
- ("a", Set("test_db.test_table0.a", "test_db.test_table1.a")),
- ("b", Set("test_db.test_table0.b", "test_db.test_table1.b")),
- ("c", Set("test_db.test_table0.b", "test_db.test_table1.c")))))
+ (
+ "a",
+ Set(
+ s"$DEFAULT_CATALOG.test_db.test_table0.a",
+ s"$DEFAULT_CATALOG.test_db.test_table1.a")),
+ (
+ "b",
+ Set(
+ s"$DEFAULT_CATALOG.test_db.test_table0.b",
+ s"$DEFAULT_CATALOG.test_db.test_table1.b")),
+ (
+ "c",
+ Set(
+ s"$DEFAULT_CATALOG.test_db.test_table0.b",
+ s"$DEFAULT_CATALOG.test_db.test_table1.c")))))
}
}
@@ -768,20 +836,22 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
|GROUP BY
|stat_date, channel_id, sub_channel_id, user_type, country_name
|""".stripMargin
- val ret0 = exectractLineage(sql0)
+ val ret0 = extractLineage(sql0)
assert(ret0 == Lineage(
- List("test_db.test_order_item"),
+ List(s"$DEFAULT_CATALOG.test_db.test_order_item"),
List(),
List(
- ("stat_date", Set("test_db.test_order_item.stat_date")),
- ("channel_id", Set("test_db.test_order_item.channel_id")),
- ("sub_channel_id", Set("test_db.test_order_item.sub_channel_id")),
- ("user_type", Set("test_db.test_order_item.user_type")),
- ("country_name", Set("test_db.test_order_item.country_name")),
- ("get_count0", Set("test_db.test_order_item.order_id")),
+ ("stat_date", Set(s"$DEFAULT_CATALOG.test_db.test_order_item.stat_date")),
+ ("channel_id", Set(s"$DEFAULT_CATALOG.test_db.test_order_item.channel_id")),
+ ("sub_channel_id", Set(s"$DEFAULT_CATALOG.test_db.test_order_item.sub_channel_id")),
+ ("user_type", Set(s"$DEFAULT_CATALOG.test_db.test_order_item.user_type")),
+ ("country_name", Set(s"$DEFAULT_CATALOG.test_db.test_order_item.country_name")),
+ ("get_count0", Set(s"$DEFAULT_CATALOG.test_db.test_order_item.order_id")),
(
"get_amount0",
- Set("test_db.test_order_item.goods_count", "test_db.test_order_item.shop_price")),
+ Set(
+ s"$DEFAULT_CATALOG.test_db.test_order_item.goods_count",
+ s"$DEFAULT_CATALOG.test_db.test_order_item.shop_price")),
("add_time", Set[String]()))))
val sql1 =
"""
@@ -806,104 +876,120 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
|) a
|GROUP BY a.channel_id, a.sub_channel_id, a.country_name
|""".stripMargin
- val ret1 = exectractLineage(sql1)
+ val ret1 = extractLineage(sql1)
assert(ret1 == Lineage(
- List("test_db.test_order_item", "test_db.test_p0_order_item"),
+ List(
+ s"$DEFAULT_CATALOG.test_db.test_order_item",
+ s"$DEFAULT_CATALOG.test_db.test_p0_order_item"),
List(),
List(
(
"channel_id",
- Set("test_db.test_order_item.channel_id", "test_db.test_p0_order_item.channel_id")),
+ Set(
+ s"$DEFAULT_CATALOG.test_db.test_order_item.channel_id",
+ s"$DEFAULT_CATALOG.test_db.test_p0_order_item.channel_id")),
(
"sub_channel_id",
Set(
- "test_db.test_order_item.sub_channel_id",
- "test_db.test_p0_order_item.sub_channel_id")),
+ s"$DEFAULT_CATALOG.test_db.test_order_item.sub_channel_id",
+ s"$DEFAULT_CATALOG.test_db.test_p0_order_item.sub_channel_id")),
(
"country_name",
- Set("test_db.test_order_item.country_name", "test_db.test_p0_order_item.country_name")),
- ("get_count0", Set("test_db.test_order_item.order_id")),
+ Set(
+ s"$DEFAULT_CATALOG.test_db.test_order_item.country_name",
+ s"$DEFAULT_CATALOG.test_db.test_p0_order_item.country_name")),
+ ("get_count0", Set(s"$DEFAULT_CATALOG.test_db.test_order_item.order_id")),
(
"get_amount0",
- Set("test_db.test_order_item.goods_count", "test_db.test_order_item.shop_price")),
+ Set(
+ s"$DEFAULT_CATALOG.test_db.test_order_item.goods_count",
+ s"$DEFAULT_CATALOG.test_db.test_order_item.shop_price")),
("add_time", Set[String]()))))
}
}
test("columns lineage extract - agg sql") {
val sql0 = """select key as a, count(*) as b, 1 as c from test_db0.test_table0 group by key"""
- val ret0 = exectractLineage(sql0)
+ val ret0 = extractLineage(sql0)
assert(ret0 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List(),
List(
- ("a", Set("test_db0.test_table0.key")),
- ("b", Set("test_db0.test_table0.__count__")),
+ ("a", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
+ ("b", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.__count__")),
("c", Set()))))
val sql1 = """select count(*) as a, 1 as b from test_db0.test_table0"""
- val ret1 = exectractLineage(sql1)
+ val ret1 = extractLineage(sql1)
assert(ret1 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List(),
List(
- ("a", Set("test_db0.test_table0.__count__")),
+ ("a", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.__count__")),
("b", Set()))))
val sql2 = """select every(key == 1) as a, 1 as b from test_db0.test_table0"""
- val ret2 = exectractLineage(sql2)
+ val ret2 = extractLineage(sql2)
assert(ret2 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List(),
List(
- ("a", Set("test_db0.test_table0.key")),
+ ("a", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
("b", Set()))))
val sql3 = """select count(*) as a, 1 as b from test_db0.test_table0"""
- val ret3 = exectractLineage(sql3)
+ val ret3 = extractLineage(sql3)
assert(ret3 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List(),
List(
- ("a", Set("test_db0.test_table0.__count__")),
+ ("a", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.__count__")),
("b", Set()))))
val sql4 = """select first(key) as a, 1 as b from test_db0.test_table0"""
- val ret4 = exectractLineage(sql4)
+ val ret4 = extractLineage(sql4)
assert(ret4 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List(),
List(
- ("a", Set("test_db0.test_table0.key")),
+ ("a", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
("b", Set()))))
val sql5 = """select avg(key) as a, 1 as b from test_db0.test_table0"""
- val ret5 = exectractLineage(sql5)
+ val ret5 = extractLineage(sql5)
assert(ret5 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List(),
List(
- ("a", Set("test_db0.test_table0.key")),
+ ("a", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
("b", Set()))))
val sql6 =
"""select count(value) + sum(key) as a,
| 1 as b from test_db0.test_table0""".stripMargin
- val ret6 = exectractLineage(sql6)
+ val ret6 = extractLineage(sql6)
assert(ret6 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List(),
List(
- ("a", Set("test_db0.test_table0.value", "test_db0.test_table0.key")),
+ (
+ "a",
+ Set(
+ s"$DEFAULT_CATALOG.test_db0.test_table0.value",
+ s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
("b", Set()))))
val sql7 = """select count(*) + sum(key) as a, 1 as b from test_db0.test_table0"""
- val ret7 = exectractLineage(sql7)
+ val ret7 = extractLineage(sql7)
assert(ret7 == Lineage(
- List("test_db0.test_table0"),
+ List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
List(),
List(
- ("a", Set("test_db0.test_table0.__count__", "test_db0.test_table0.key")),
+ (
+ "a",
+ Set(
+ s"$DEFAULT_CATALOG.test_db0.test_table0.__count__",
+ s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
("b", Set()))))
}
@@ -921,13 +1007,13 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
"""
|select b.a as aa, t0_cached.b0 as bb from t0_cached join table1 b on b.a = t0_cached.a0
|""".stripMargin
- val ret0 = exectractLineage(sql0)
+ val ret0 = extractLineage(sql0)
assert(ret0 == Lineage(
- List("default.table1", "default.table0"),
+ List(s"$DEFAULT_CATALOG.default.table1", s"$DEFAULT_CATALOG.default.table0"),
List(),
List(
- ("aa", Set("default.table1.a")),
- ("bb", Set("default.table0.b")))))
+ ("aa", Set(s"$DEFAULT_CATALOG.default.table1.a")),
+ ("bb", Set(s"$DEFAULT_CATALOG.default.table0.b")))))
val df0 = spark.sql("select a as a0, b as b0 from table0 where a = 2")
df0.cache()
@@ -936,11 +1022,11 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
val analyzed = df.queryExecution.analyzed
val ret1 = SparkSQLLineageParseHelper(spark).transformToLineage(0, analyzed).get
assert(ret1 == Lineage(
- List("default.table0", "default.table1"),
+ List(s"$DEFAULT_CATALOG.default.table0", s"$DEFAULT_CATALOG.default.table1"),
List(),
List(
- ("aa", Set("default.table0.a")),
- ("bb", Set("default.table1.b")))))
+ ("aa", Set(s"$DEFAULT_CATALOG.default.table0.a")),
+ ("bb", Set(s"$DEFAULT_CATALOG.default.table1.b")))))
}
}
@@ -956,155 +1042,155 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
"""
|select a as aa, bb, cc from (select b as bb, c as cc from table1) t0, table0
|""".stripMargin
- val ret0 = exectractLineage(sql0)
+ val ret0 = extractLineage(sql0)
assert(ret0 == Lineage(
- List("default.table0", "default.table1"),
+ List(s"$DEFAULT_CATALOG.default.table0", s"$DEFAULT_CATALOG.default.table1"),
List(),
List(
- ("aa", Set("default.table0.a")),
- ("bb", Set("default.table1.b")),
- ("cc", Set("default.table1.c")))))
+ ("aa", Set(s"$DEFAULT_CATALOG.default.table0.a")),
+ ("bb", Set(s"$DEFAULT_CATALOG.default.table1.b")),
+ ("cc", Set(s"$DEFAULT_CATALOG.default.table1.c")))))
val sql1 =
"""
|select (select a from table1) as aa, b as bb from table1
|""".stripMargin
- val ret1 = exectractLineage(sql1)
+ val ret1 = extractLineage(sql1)
assert(ret1 == Lineage(
- List("default.table1"),
+ List(s"$DEFAULT_CATALOG.default.table1"),
List(),
List(
- ("aa", Set("default.table1.a")),
- ("bb", Set("default.table1.b")))))
+ ("aa", Set(s"$DEFAULT_CATALOG.default.table1.a")),
+ ("bb", Set(s"$DEFAULT_CATALOG.default.table1.b")))))
val sql2 =
"""
|select (select count(*) from table0) as aa, b as bb from table1
|""".stripMargin
- val ret2 = exectractLineage(sql2)
+ val ret2 = extractLineage(sql2)
assert(ret2 == Lineage(
- List("default.table0", "default.table1"),
+ List(s"$DEFAULT_CATALOG.default.table0", s"$DEFAULT_CATALOG.default.table1"),
List(),
List(
- ("aa", Set("default.table0.__count__")),
- ("bb", Set("default.table1.b")))))
+ ("aa", Set(s"$DEFAULT_CATALOG.default.table0.__count__")),
+ ("bb", Set(s"$DEFAULT_CATALOG.default.table1.b")))))
// ListQuery
val sql3 =
"""
|select * from table0 where table0.a in (select a from table1)
|""".stripMargin
- val ret3 = exectractLineage(sql3)
+ val ret3 = extractLineage(sql3)
assert(ret3 == Lineage(
- List("default.table0"),
+ List(s"$DEFAULT_CATALOG.default.table0"),
List(),
List(
- ("a", Set("default.table0.a")),
- ("b", Set("default.table0.b")),
- ("c", Set("default.table0.c")))))
+ ("a", Set(s"$DEFAULT_CATALOG.default.table0.a")),
+ ("b", Set(s"$DEFAULT_CATALOG.default.table0.b")),
+ ("c", Set(s"$DEFAULT_CATALOG.default.table0.c")))))
// Exists
val sql4 =
"""
|select * from table0 where exists (select * from table1 where table0.c = table1.c)
|""".stripMargin
- val ret4 = exectractLineage(sql4)
+ val ret4 = extractLineage(sql4)
assert(ret4 == Lineage(
- List("default.table0"),
+ List(s"$DEFAULT_CATALOG.default.table0"),
List(),
List(
- ("a", Set("default.table0.a")),
- ("b", Set("default.table0.b")),
- ("c", Set("default.table0.c")))))
+ ("a", Set(s"$DEFAULT_CATALOG.default.table0.a")),
+ ("b", Set(s"$DEFAULT_CATALOG.default.table0.b")),
+ ("c", Set(s"$DEFAULT_CATALOG.default.table0.c")))))
val sql5 =
"""
|select * from table0 where exists (select * from table1 where c = "odone")
|""".stripMargin
- val ret5 = exectractLineage(sql5)
+ val ret5 = extractLineage(sql5)
assert(ret5 == Lineage(
- List("default.table0"),
+ List(s"$DEFAULT_CATALOG.default.table0"),
List(),
List(
- ("a", Set("default.table0.a")),
- ("b", Set("default.table0.b")),
- ("c", Set("default.table0.c")))))
+ ("a", Set(s"$DEFAULT_CATALOG.default.table0.a")),
+ ("b", Set(s"$DEFAULT_CATALOG.default.table0.b")),
+ ("c", Set(s"$DEFAULT_CATALOG.default.table0.c")))))
val sql6 =
"""
|select * from table0 where not exists (select * from table1 where c = "odone")
|""".stripMargin
- val ret6 = exectractLineage(sql6)
+ val ret6 = extractLineage(sql6)
assert(ret6 == Lineage(
- List("default.table0"),
+ List(s"$DEFAULT_CATALOG.default.table0"),
List(),
List(
- ("a", Set("default.table0.a")),
- ("b", Set("default.table0.b")),
- ("c", Set("default.table0.c")))))
+ ("a", Set(s"$DEFAULT_CATALOG.default.table0.a")),
+ ("b", Set(s"$DEFAULT_CATALOG.default.table0.b")),
+ ("c", Set(s"$DEFAULT_CATALOG.default.table0.c")))))
val sql7 =
"""
|select * from table0 where table0.a not in (select a from table1)
|""".stripMargin
- val ret7 = exectractLineage(sql7)
+ val ret7 = extractLineage(sql7)
assert(ret7 == Lineage(
- List("default.table0"),
+ List(s"$DEFAULT_CATALOG.default.table0"),
List(),
List(
- ("a", Set("default.table0.a")),
- ("b", Set("default.table0.b")),
- ("c", Set("default.table0.c")))))
+ ("a", Set(s"$DEFAULT_CATALOG.default.table0.a")),
+ ("b", Set(s"$DEFAULT_CATALOG.default.table0.b")),
+ ("c", Set(s"$DEFAULT_CATALOG.default.table0.c")))))
val sql8 =
"""
|select (select a from table1) + 1, b as bb from table1
|""".stripMargin
- val ret8 = exectractLineage(sql8)
+ val ret8 = extractLineage(sql8)
assert(ret8 == Lineage(
- List("default.table1"),
+ List(s"$DEFAULT_CATALOG.default.table1"),
List(),
List(
- ("(scalarsubquery() + 1)", Set("default.table1.a")),
- ("bb", Set("default.table1.b")))))
+ ("(scalarsubquery() + 1)", Set(s"$DEFAULT_CATALOG.default.table1.a")),
+ ("bb", Set(s"$DEFAULT_CATALOG.default.table1.b")))))
val sql9 =
"""
|select (select a from table1 limit 1) + 1 as aa, b as bb from table1
|""".stripMargin
- val ret9 = exectractLineage(sql9)
+ val ret9 = extractLineage(sql9)
assert(ret9 == Lineage(
- List("default.table1"),
+ List(s"$DEFAULT_CATALOG.default.table1"),
List(),
List(
- ("aa", Set("default.table1.a")),
- ("bb", Set("default.table1.b")))))
+ ("aa", Set(s"$DEFAULT_CATALOG.default.table1.a")),
+ ("bb", Set(s"$DEFAULT_CATALOG.default.table1.b")))))
val sql10 =
"""
|select (select a from table1 limit 1) + (select a from table0 limit 1) + 1 as aa,
| b as bb from table1
|""".stripMargin
- val ret10 = exectractLineage(sql10)
+ val ret10 = extractLineage(sql10)
assert(ret10 == Lineage(
- List("default.table1", "default.table0"),
+ List(s"$DEFAULT_CATALOG.default.table1", s"$DEFAULT_CATALOG.default.table0"),
List(),
List(
- ("aa", Set("default.table1.a", "default.table0.a")),
- ("bb", Set("default.table1.b")))))
+ ("aa", Set(s"$DEFAULT_CATALOG.default.table1.a", s"$DEFAULT_CATALOG.default.table0.a")),
+ ("bb", Set(s"$DEFAULT_CATALOG.default.table1.b")))))
val sql11 =
"""
|select tmp.a, b from (select * from table1) tmp;
|""".stripMargin
- val ret11 = exectractLineage(sql11)
+ val ret11 = extractLineage(sql11)
assert(ret11 == Lineage(
- List("default.table1"),
+ List(s"$DEFAULT_CATALOG.default.table1"),
List(),
List(
- ("a", Set("default.table1.a")),
- ("b", Set("default.table1.b")))))
+ ("a", Set(s"$DEFAULT_CATALOG.default.table1.a")),
+ ("b", Set(s"$DEFAULT_CATALOG.default.table1.b")))))
}
}
@@ -1115,21 +1201,23 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
spark.sql("CREATE TABLE v2_catalog.db.t1 (a string, b string, c string)")
spark.sql("CREATE TABLE v2_catalog.db.t2 (a string, b string, c string)")
val ret0 =
- exectractLineage(
+ extractLineage(
s"insert into table t1 select a," +
s"concat_ws('/', collect_set(b))," +
s"count(distinct(b)) * count(distinct(c))" +
s"from t2 group by a")
assert(ret0 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set("default.t2.a")),
- ("default.t1.b", Set("default.t2.b")),
- ("default.t1.c", Set("default.t2.b", "default.t2.c")))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set(s"$DEFAULT_CATALOG.default.t2.a")),
+ (s"$DEFAULT_CATALOG.default.t1.b", Set(s"$DEFAULT_CATALOG.default.t2.b")),
+ (
+ s"$DEFAULT_CATALOG.default.t1.c",
+ Set(s"$DEFAULT_CATALOG.default.t2.b", s"$DEFAULT_CATALOG.default.t2.c")))))
val ret1 =
- exectractLineage(
+ extractLineage(
s"insert into table v2_catalog.db.t1 select a," +
s"concat_ws('/', collect_set(b))," +
s"count(distinct(b)) * count(distinct(c))" +
@@ -1143,7 +1231,7 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
("v2_catalog.db.t1.c", Set("v2_catalog.db.t2.b", "v2_catalog.db.t2.c")))))
val ret2 =
- exectractLineage(
+ extractLineage(
s"insert into table v2_catalog.db.t1 select a," +
s"count(distinct(b+c))," +
s"count(distinct(b)) * count(distinct(c))" +
@@ -1163,16 +1251,16 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
spark.sql("CREATE TABLE t1 (a string, b string, c string) USING hive")
spark.sql("CREATE TABLE t2 (a string, b string, c string, d string) USING hive")
val ret0 =
- exectractLineage(
+ extractLineage(
s"insert into table t1 select a,b,GROUPING__ID " +
s"from t2 group by a,b,c,d grouping sets ((a,b,c), (a,b,d))")
assert(ret0 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set("default.t2.a")),
- ("default.t1.b", Set("default.t2.b")),
- ("default.t1.c", Set()))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set(s"$DEFAULT_CATALOG.default.t2.a")),
+ (s"$DEFAULT_CATALOG.default.t1.b", Set(s"$DEFAULT_CATALOG.default.t2.b")),
+ (s"$DEFAULT_CATALOG.default.t1.c", Set()))))
}
}
@@ -1185,45 +1273,47 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
s"cache table c1 select * from (" +
s"select a, b, row_number() over (partition by a order by b asc ) rank from t2)" +
s" where rank=1")
- val ret0 = exectractLineage("insert overwrite table t1 select a, b from c1")
+ val ret0 = extractLineage("insert overwrite table t1 select a, b from c1")
assert(ret0 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set("default.t2.a")),
- ("default.t1.b", Set("default.t2.b")))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set(s"$DEFAULT_CATALOG.default.t2.a")),
+ (s"$DEFAULT_CATALOG.default.t1.b", Set(s"$DEFAULT_CATALOG.default.t2.b")))))
- val ret1 = exectractLineage("insert overwrite table t1 select a, rank from c1")
+ val ret1 = extractLineage("insert overwrite table t1 select a, rank from c1")
assert(ret1 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set("default.t2.a")),
- ("default.t1.b", Set("default.t2.a", "default.t2.b")))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set(s"$DEFAULT_CATALOG.default.t2.a")),
+ (
+ s"$DEFAULT_CATALOG.default.t1.b",
+ Set(s"$DEFAULT_CATALOG.default.t2.a", s"$DEFAULT_CATALOG.default.t2.b")))))
spark.sql(
s"cache table c2 select * from (" +
s"select b, a, row_number() over (partition by a order by b asc ) rank from t2)" +
s" where rank=1")
- val ret2 = exectractLineage("insert overwrite table t1 select a, b from c2")
+ val ret2 = extractLineage("insert overwrite table t1 select a, b from c2")
assert(ret2 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set("default.t2.a")),
- ("default.t1.b", Set("default.t2.b")))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set(s"$DEFAULT_CATALOG.default.t2.a")),
+ (s"$DEFAULT_CATALOG.default.t1.b", Set(s"$DEFAULT_CATALOG.default.t2.b")))))
spark.sql(
s"cache table c3 select * from (" +
s"select a as aa, b as bb, row_number() over (partition by a order by b asc ) rank" +
s" from t2) where rank=1")
- val ret3 = exectractLineage("insert overwrite table t1 select aa, bb from c3")
+ val ret3 = extractLineage("insert overwrite table t1 select aa, bb from c3")
assert(ret3 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set("default.t2.a")),
- ("default.t1.b", Set("default.t2.b")))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set(s"$DEFAULT_CATALOG.default.t2.a")),
+ (s"$DEFAULT_CATALOG.default.t1.b", Set(s"$DEFAULT_CATALOG.default.t2.b")))))
}
}
@@ -1231,16 +1321,16 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
withTable("t1", "t2") { _ =>
spark.sql("CREATE TABLE t1 (a string, b string, c string) USING hive")
spark.sql("CREATE TABLE t2 (a string, b string, c string) USING hive")
- val ret0 = exectractLineage("insert into t1 select 1,2,(select count(distinct" +
+ val ret0 = extractLineage("insert into t1 select 1,2,(select count(distinct" +
" ifnull(get_json_object(a, '$.b.imei'), get_json_object(a, '$.b.android_id'))) from t2)")
assert(ret0 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set()),
- ("default.t1.b", Set()),
- ("default.t1.c", Set("default.t2.a")))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set()),
+ (s"$DEFAULT_CATALOG.default.t1.b", Set()),
+ (s"$DEFAULT_CATALOG.default.t1.c", Set(s"$DEFAULT_CATALOG.default.t2.a")))))
}
}
@@ -1250,17 +1340,17 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
withView("t2") { _ =>
spark.sql("CREATE VIEW t2 as select * from t1")
val ret0 =
- exectractLineage(
+ extractLineage(
s"create or replace view view_tst comment 'view'" +
s" as select a as k,b" +
s" from t2" +
s" where a in ('HELLO') and c = 'HELLO'")
assert(ret0 == Lineage(
- List("default.t1"),
- List("default.view_tst"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
+ List(s"$DEFAULT_CATALOG.default.view_tst"),
List(
- ("default.view_tst.k", Set("default.t1.a")),
- ("default.view_tst.b", Set("default.t1.b")))))
+ (s"$DEFAULT_CATALOG.default.view_tst.k", Set(s"$DEFAULT_CATALOG.default.t1.a")),
+ (s"$DEFAULT_CATALOG.default.view_tst.b", Set(s"$DEFAULT_CATALOG.default.t1.b")))))
}
}
}
@@ -1272,16 +1362,16 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
withView("t2") { _ =>
spark.sql("CREATE VIEW t2 as select * from t1")
val ret0 =
- exectractLineage(
+ extractLineage(
s"select a as k, b" +
s" from t2" +
s" where a in ('HELLO') and c = 'HELLO'")
assert(ret0 == Lineage(
- List("default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
List(),
List(
- ("k", Set("default.t2.a")),
- ("b", Set("default.t2.b")))))
+ ("k", Set(s"$DEFAULT_CATALOG.default.t2.a")),
+ ("b", Set(s"$DEFAULT_CATALOG.default.t2.b")))))
}
}
}
@@ -1291,17 +1381,17 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
spark.sql("CREATE TABLE t1 (a string, b string) USING hive")
spark.sql("CREATE TABLE t2 (a string, b string) USING hive")
spark.sql("CREATE TABLE t3 (a string, b string) USING hive")
- val ret0 = exectractLineage("from (select a,b from t1)" +
+ val ret0 = extractLineage("from (select a,b from t1)" +
" insert overwrite table t2 select a,b where a=1" +
" insert overwrite table t3 select a,b where b=1")
assert(ret0 == Lineage(
- List("default.t1"),
- List("default.t2", "default.t3"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2", s"$DEFAULT_CATALOG.default.t3"),
List(
- ("default.t2.a", Set("default.t1.a")),
- ("default.t2.b", Set("default.t1.b")),
- ("default.t3.a", Set("default.t1.a")),
- ("default.t3.b", Set("default.t1.b")))))
+ (s"$DEFAULT_CATALOG.default.t2.a", Set(s"$DEFAULT_CATALOG.default.t1.a")),
+ (s"$DEFAULT_CATALOG.default.t2.b", Set(s"$DEFAULT_CATALOG.default.t1.b")),
+ (s"$DEFAULT_CATALOG.default.t3.a", Set(s"$DEFAULT_CATALOG.default.t1.a")),
+ (s"$DEFAULT_CATALOG.default.t3.b", Set(s"$DEFAULT_CATALOG.default.t1.b")))))
}
}
@@ -1310,52 +1400,52 @@ class SparkSQLLineageParserHelperSuite extends KyuubiFunSuite
spark.sql("CREATE TABLE t1 (a string, b string, c string, d string) USING hive")
spark.sql("CREATE TABLE t2 (a string, b string, c string, d string) USING hive")
- val ret0 = exectractLineage("insert into t1 select 1, t2.b, cc.action, t2.d " +
+ val ret0 = extractLineage("insert into t1 select 1, t2.b, cc.action, t2.d " +
"from t2 lateral view explode(split(c,'\\},\\{')) cc as action")
assert(ret0 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set()),
- ("default.t1.b", Set("default.t2.b")),
- ("default.t1.c", Set("default.t2.c")),
- ("default.t1.d", Set("default.t2.d")))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set()),
+ (s"$DEFAULT_CATALOG.default.t1.b", Set(s"$DEFAULT_CATALOG.default.t2.b")),
+ (s"$DEFAULT_CATALOG.default.t1.c", Set(s"$DEFAULT_CATALOG.default.t2.c")),
+ (s"$DEFAULT_CATALOG.default.t1.d", Set(s"$DEFAULT_CATALOG.default.t2.d")))))
- val ret1 = exectractLineage("insert into t1 select 1, t2.b, cc.action0, dd.action1 " +
+ val ret1 = extractLineage("insert into t1 select 1, t2.b, cc.action0, dd.action1 " +
"from t2 " +
"lateral view explode(split(c,'\\},\\{')) cc as action0 " +
"lateral view explode(split(d,'\\},\\{')) dd as action1")
assert(ret1 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set()),
- ("default.t1.b", Set("default.t2.b")),
- ("default.t1.c", Set("default.t2.c")),
- ("default.t1.d", Set("default.t2.d")))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set()),
+ (s"$DEFAULT_CATALOG.default.t1.b", Set(s"$DEFAULT_CATALOG.default.t2.b")),
+ (s"$DEFAULT_CATALOG.default.t1.c", Set(s"$DEFAULT_CATALOG.default.t2.c")),
+ (s"$DEFAULT_CATALOG.default.t1.d", Set(s"$DEFAULT_CATALOG.default.t2.d")))))
- val ret2 = exectractLineage("insert into t1 select 1, t2.b, dd.pos, dd.action1 " +
+ val ret2 = extractLineage("insert into t1 select 1, t2.b, dd.pos, dd.action1 " +
"from t2 " +
"lateral view posexplode(split(d,'\\},\\{')) dd as pos, action1")
assert(ret2 == Lineage(
- List("default.t2"),
- List("default.t1"),
+ List(s"$DEFAULT_CATALOG.default.t2"),
+ List(s"$DEFAULT_CATALOG.default.t1"),
List(
- ("default.t1.a", Set()),
- ("default.t1.b", Set("default.t2.b")),
- ("default.t1.c", Set("default.t2.d")),
- ("default.t1.d", Set("default.t2.d")))))
+ (s"$DEFAULT_CATALOG.default.t1.a", Set()),
+ (s"$DEFAULT_CATALOG.default.t1.b", Set(s"$DEFAULT_CATALOG.default.t2.b")),
+ (s"$DEFAULT_CATALOG.default.t1.c", Set(s"$DEFAULT_CATALOG.default.t2.d")),
+ (s"$DEFAULT_CATALOG.default.t1.d", Set(s"$DEFAULT_CATALOG.default.t2.d")))))
}
}
- private def exectractLineageWithoutExecuting(sql: String): Lineage = {
+ private def extractLineageWithoutExecuting(sql: String): Lineage = {
val parsed = spark.sessionState.sqlParser.parsePlan(sql)
val analyzed = spark.sessionState.analyzer.execute(parsed)
spark.sessionState.analyzer.checkAnalysis(analyzed)
SparkSQLLineageParseHelper(spark).transformToLineage(0, analyzed).get
}
- private def exectractLineage(sql: String): Lineage = {
+ private def extractLineage(sql: String): Lineage = {
val parsed = spark.sessionState.sqlParser.parsePlan(sql)
val qe = spark.sessionState.executePlan(parsed)
val analyzed = qe.analyzed
diff --git a/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/spark/sql/SparkListenerExtenstionTest.scala b/extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/spark/sql/SparkListenerExtensionTest.scala
similarity index 100%
rename from extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/spark/sql/SparkListenerExtenstionTest.scala
rename to extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/spark/sql/SparkListenerExtensionTest.scala
diff --git a/externals/kyuubi-chat-engine/pom.xml b/externals/kyuubi-chat-engine/pom.xml
index 28779f4504f..3639ceed329 100644
--- a/externals/kyuubi-chat-engine/pom.xml
+++ b/externals/kyuubi-chat-engine/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../pom.xml
- kyuubi-chat-engine_2.12
+ kyuubi-chat-engine_${scala.binary.version}jarKyuubi Project Engine Chathttps://kyuubi.apache.org/
diff --git a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/operation/ChatOperation.scala b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/operation/ChatOperation.scala
index 38527cbf1f8..b0b1806f80c 100644
--- a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/operation/ChatOperation.scala
+++ b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/operation/ChatOperation.scala
@@ -31,7 +31,9 @@ abstract class ChatOperation(session: Session) extends AbstractOperation(session
protected lazy val conf: KyuubiConf = session.sessionManager.getConf
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
setHasResultSet(true)
@@ -47,7 +49,10 @@ abstract class ChatOperation(session: Session) extends AbstractOperation(session
val taken = iter.take(rowSetSize)
val resultRowSet = RowSet.toTRowSet(taken.toSeq, 1, getProtocolVersion)
resultRowSet.setStartRowOffset(iter.getPosition)
- resultRowSet
+ val resp = new TFetchResultsResp(OK_STATUS)
+ resp.setResults(resultRowSet)
+ resp.setHasMoreRows(false)
+ resp
}
override def cancel(): Unit = {
@@ -62,7 +67,7 @@ abstract class ChatOperation(session: Session) extends AbstractOperation(session
// We should use Throwable instead of Exception since `java.lang.NoClassDefFoundError`
// could be thrown.
case e: Throwable =>
- state.synchronized {
+ withLockRequired {
val errMsg = Utils.stringifyException(e)
if (state == OperationState.TIMEOUT) {
val ke = KyuubiSQLException(s"Timeout operating $opType: $errMsg")
diff --git a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/ChatGPTProvider.scala b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/ChatGPTProvider.scala
index cdea89d2aad..aae8b488a5c 100644
--- a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/ChatGPTProvider.scala
+++ b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/ChatGPTProvider.scala
@@ -26,7 +26,7 @@ import scala.collection.JavaConverters._
import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
import com.theokanning.openai.OpenAiApi
-import com.theokanning.openai.completion.chat.{ChatCompletionRequest, ChatMessage}
+import com.theokanning.openai.completion.chat.{ChatCompletionRequest, ChatMessage, ChatMessageRole}
import com.theokanning.openai.service.OpenAiService
import com.theokanning.openai.service.OpenAiService.{defaultClient, defaultObjectMapper, defaultRetrofit}
@@ -60,6 +60,8 @@ class ChatGPTProvider(conf: KyuubiConf) extends ChatProvider {
new OpenAiService(api)
}
+ private var sessionUser: Option[String] = None
+
private val chatHistory: LoadingCache[String, util.ArrayDeque[ChatMessage]] =
CacheBuilder.newBuilder()
.expireAfterWrite(10, TimeUnit.MINUTES)
@@ -68,20 +70,23 @@ class ChatGPTProvider(conf: KyuubiConf) extends ChatProvider {
new util.ArrayDeque[ChatMessage]
})
- override def open(sessionId: String): Unit = {
+ override def open(sessionId: String, user: Option[String]): Unit = {
+ sessionUser = user
chatHistory.getIfPresent(sessionId)
}
override def ask(sessionId: String, q: String): String = {
val messages = chatHistory.get(sessionId)
try {
- messages.addLast(new ChatMessage("user", q))
+ messages.addLast(new ChatMessage(ChatMessageRole.USER.value(), q))
val completionRequest = ChatCompletionRequest.builder()
.model(conf.get(KyuubiConf.ENGINE_CHAT_GPT_MODEL))
.messages(messages.asScala.toList.asJava)
+ .user(sessionUser.orNull)
+ .n(1)
.build()
- val responseText = openAiService.createChatCompletion(completionRequest).getChoices.asScala
- .map(c => c.getMessage.getContent).mkString
+ val responseText = openAiService.createChatCompletion(completionRequest)
+ .getChoices.get(0).getMessage.getContent
responseText
} catch {
case e: Throwable =>
diff --git a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/ChatProvider.scala b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/ChatProvider.scala
index af1ba434bea..06d7193805f 100644
--- a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/ChatProvider.scala
+++ b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/ChatProvider.scala
@@ -24,11 +24,11 @@ import com.fasterxml.jackson.module.scala.{ClassTagExtensions, DefaultScalaModul
import org.apache.kyuubi.{KyuubiException, Logging}
import org.apache.kyuubi.config.KyuubiConf
-import org.apache.kyuubi.reflection.DynConstructors
+import org.apache.kyuubi.util.reflect.DynConstructors
trait ChatProvider {
- def open(sessionId: String): Unit
+ def open(sessionId: String, user: Option[String] = None): Unit
def ask(sessionId: String, q: String): String
diff --git a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/EchoProvider.scala b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/EchoProvider.scala
index 31ad3b8e390..1116ea785dc 100644
--- a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/EchoProvider.scala
+++ b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/provider/EchoProvider.scala
@@ -19,7 +19,7 @@ package org.apache.kyuubi.engine.chat.provider
class EchoProvider extends ChatProvider {
- override def open(sessionId: String): Unit = {}
+ override def open(sessionId: String, user: Option[String]): Unit = {}
override def ask(sessionId: String, q: String): String =
"This is ChatKyuubi, nice to meet you!"
diff --git a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/session/ChatSessionImpl.scala b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/session/ChatSessionImpl.scala
index 29f42076822..6ec6d062600 100644
--- a/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/session/ChatSessionImpl.scala
+++ b/externals/kyuubi-chat-engine/src/main/scala/org/apache/kyuubi/engine/chat/session/ChatSessionImpl.scala
@@ -38,7 +38,7 @@ class ChatSessionImpl(
override def open(): Unit = {
info(s"Starting to open chat session.")
- chatProvider.open(handle.identifier.toString)
+ chatProvider.open(handle.identifier.toString, Some(user))
super.open()
info(s"The chat session is started.")
}
diff --git a/externals/kyuubi-download/pom.xml b/externals/kyuubi-download/pom.xml
index d7f0c601322..b21e3e5a223 100644
--- a/externals/kyuubi-download/pom.xml
+++ b/externals/kyuubi-download/pom.xml
@@ -21,7 +21,7 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../pom.xml
diff --git a/externals/kyuubi-flink-sql-engine/pom.xml b/externals/kyuubi-flink-sql-engine/pom.xml
index f3633b904f5..eec5c1cd9e8 100644
--- a/externals/kyuubi-flink-sql-engine/pom.xml
+++ b/externals/kyuubi-flink-sql-engine/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../pom.xml
- kyuubi-flink-sql-engine_2.12
+ kyuubi-flink-sql-engine_${scala.binary.version}jarKyuubi Project Engine Flink SQLhttps://kyuubi.apache.org/
@@ -77,25 +77,25 @@
org.apache.flink
- flink-table-common
+ flink-sql-gatewayprovidedorg.apache.flink
- flink-table-api-java
+ flink-table-commonprovidedorg.apache.flink
- flink-table-api-java-bridge
+ flink-table-api-javaprovidedorg.apache.flink
- flink-table-planner_${scala.binary.version}
+ flink-table-api-java-bridgeprovided
@@ -105,12 +105,6 @@
provided
-
- org.apache.flink
- flink-sql-parser
- provided
-
-
org.apache.kyuubi
@@ -126,11 +120,49 @@
${project.version}test
+
+
+ org.apache.kyuubi
+ kyuubi-zookeeper_${scala.binary.version}
+ ${project.version}
+ test
+
+
org.apache.flinkflink-test-utilstest
+
+
+ org.apache.hadoop
+ hadoop-client-minicluster
+ test
+
+
+
+ org.bouncycastle
+ bcprov-jdk15on
+ test
+
+
+
+ org.bouncycastle
+ bcpkix-jdk15on
+ test
+
+
+
+ jakarta.activation
+ jakarta.activation-api
+ test
+
+
+
+ jakarta.xml.bind
+ jakarta.xml.bind-api
+ test
+
@@ -142,20 +174,15 @@
false
- org.apache.kyuubi:kyuubi-common_${scala.binary.version}
- org.apache.kyuubi:kyuubi-ha_${scala.binary.version}com.fasterxml.jackson.core:*com.fasterxml.jackson.module:*com.google.guava:failureaccesscom.google.guava:guavacommons-codec:commons-codecorg.apache.commons:commons-lang3
- org.apache.curator:curator-client
- org.apache.curator:curator-framework
- org.apache.curator:curator-recipesorg.apache.hive:hive-service-rpcorg.apache.thrift:*
- org.apache.zookeeper:*
+ org.apache.kyuubi:*
@@ -184,13 +211,6 @@
com.fasterxml.jackson.**
-
- org.apache.curator
- ${kyuubi.shade.packageName}.org.apache.curator
-
- org.apache.curator.**
-
- com.google.common${kyuubi.shade.packageName}.com.google.common
@@ -234,20 +254,6 @@
org.apache.thrift.**
-
- org.apache.jute
- ${kyuubi.shade.packageName}.org.apache.jute
-
- org.apache.jute.**
-
-
-
- org.apache.zookeeper
- ${kyuubi.shade.packageName}.org.apache.zookeeper
-
- org.apache.zookeeper.**
-
-
diff --git a/externals/kyuubi-flink-sql-engine/src/main/java/org/apache/flink/client/deployment/application/executors/EmbeddedExecutorFactory.java b/externals/kyuubi-flink-sql-engine/src/main/java/org/apache/flink/client/deployment/application/executors/EmbeddedExecutorFactory.java
new file mode 100644
index 00000000000..558db74a372
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/main/java/org/apache/flink/client/deployment/application/executors/EmbeddedExecutorFactory.java
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.client.deployment.application.executors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+import java.util.Collection;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.client.cli.ClientOptions;
+import org.apache.flink.client.deployment.application.EmbeddedJobClient;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.DeploymentOptions;
+import org.apache.flink.core.execution.PipelineExecutor;
+import org.apache.flink.core.execution.PipelineExecutorFactory;
+import org.apache.flink.runtime.dispatcher.DispatcherGateway;
+import org.apache.flink.util.concurrent.ScheduledExecutor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Copied from Apache Flink to exposed the DispatcherGateway for Kyuubi statements. */
+@Internal
+public class EmbeddedExecutorFactory implements PipelineExecutorFactory {
+
+ private static Collection bootstrapJobIds;
+
+ private static Collection submittedJobIds;
+
+ private static DispatcherGateway dispatcherGateway;
+
+ private static ScheduledExecutor retryExecutor;
+
+ private static final Object bootstrapLock = new Object();
+
+ private static final long BOOTSTRAP_WAIT_INTERVAL = 10_000L;
+
+ private static final int BOOTSTRAP_WAIT_RETRIES = 3;
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(EmbeddedExecutorFactory.class);
+
+ public EmbeddedExecutorFactory() {
+ LOGGER.debug(
+ "{} loaded in thread {} with classloader {}.",
+ this.getClass().getCanonicalName(),
+ Thread.currentThread().getName(),
+ this.getClass().getClassLoader().toString());
+ }
+
+ /**
+ * Creates an {@link EmbeddedExecutorFactory}.
+ *
+ * @param submittedJobIds a list that is going to be filled with the job ids of the new jobs that
+ * will be submitted. This is essentially used to return the submitted job ids to the caller.
+ * @param dispatcherGateway the dispatcher of the cluster which is going to be used to submit
+ * jobs.
+ */
+ public EmbeddedExecutorFactory(
+ final Collection submittedJobIds,
+ final DispatcherGateway dispatcherGateway,
+ final ScheduledExecutor retryExecutor) {
+ // there should be only one instance of EmbeddedExecutorFactory
+ LOGGER.debug(
+ "{} initiated in thread {} with classloader {}.",
+ this.getClass().getCanonicalName(),
+ Thread.currentThread().getName(),
+ this.getClass().getClassLoader().toString());
+ checkState(EmbeddedExecutorFactory.submittedJobIds == null);
+ checkState(EmbeddedExecutorFactory.dispatcherGateway == null);
+ checkState(EmbeddedExecutorFactory.retryExecutor == null);
+ synchronized (bootstrapLock) {
+ // submittedJobIds would be always 1, because we create a new list to avoid concurrent access
+ // issues
+ LOGGER.debug("Bootstrapping EmbeddedExecutorFactory.");
+ EmbeddedExecutorFactory.submittedJobIds =
+ new ConcurrentLinkedQueue<>(checkNotNull(submittedJobIds));
+ EmbeddedExecutorFactory.bootstrapJobIds = submittedJobIds;
+ EmbeddedExecutorFactory.dispatcherGateway = checkNotNull(dispatcherGateway);
+ EmbeddedExecutorFactory.retryExecutor = checkNotNull(retryExecutor);
+ bootstrapLock.notifyAll();
+ }
+ }
+
+ @Override
+ public String getName() {
+ return EmbeddedExecutor.NAME;
+ }
+
+ @Override
+ public boolean isCompatibleWith(final Configuration configuration) {
+ // override Flink's implementation to allow usage in Kyuubi
+ LOGGER.debug("Matching execution target: {}", configuration.get(DeploymentOptions.TARGET));
+ return configuration.get(DeploymentOptions.TARGET).equalsIgnoreCase("yarn-application")
+ && configuration.toMap().getOrDefault("yarn.tags", "").toLowerCase().contains("kyuubi");
+ }
+
+ @Override
+ public PipelineExecutor getExecutor(final Configuration configuration) {
+ checkNotNull(configuration);
+ Collection executorJobIDs;
+ synchronized (bootstrapLock) {
+ // wait in a loop to avoid spurious wakeups
+ int retry = 0;
+ while (bootstrapJobIds == null && retry < BOOTSTRAP_WAIT_RETRIES) {
+ try {
+ LOGGER.debug("Waiting for bootstrap to complete. Wait retries: {}.", retry);
+ bootstrapLock.wait(BOOTSTRAP_WAIT_INTERVAL);
+ retry++;
+ } catch (InterruptedException e) {
+ throw new RuntimeException("Interrupted while waiting for bootstrap.", e);
+ }
+ }
+ if (bootstrapJobIds == null) {
+ throw new RuntimeException(
+ "Bootstrap of Flink SQL engine timed out after "
+ + BOOTSTRAP_WAIT_INTERVAL * BOOTSTRAP_WAIT_RETRIES
+ + " ms. Please check the engine log for more details.");
+ }
+ }
+ if (bootstrapJobIds.size() > 0) {
+ LOGGER.info("Submitting new Kyuubi job. Job submitted: {}.", submittedJobIds.size());
+ executorJobIDs = submittedJobIds;
+ } else {
+ LOGGER.info("Bootstrapping Flink SQL engine with the initial SQL.");
+ executorJobIDs = bootstrapJobIds;
+ }
+ return new EmbeddedExecutor(
+ executorJobIDs,
+ dispatcherGateway,
+ (jobId, userCodeClassloader) -> {
+ final Time timeout =
+ Time.milliseconds(configuration.get(ClientOptions.CLIENT_TIMEOUT).toMillis());
+ return new EmbeddedJobClient(
+ jobId, dispatcherGateway, retryExecutor, timeout, userCodeClassloader);
+ });
+ }
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/resources/META-INF/services/org.apache.flink.core.execution.PipelineExecutorFactory b/externals/kyuubi-flink-sql-engine/src/main/resources/META-INF/services/org.apache.flink.core.execution.PipelineExecutorFactory
new file mode 100644
index 00000000000..c394c07a7ba
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/main/resources/META-INF/services/org.apache.flink.core.execution.PipelineExecutorFactory
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+org.apache.flink.client.deployment.application.executors.EmbeddedExecutorFactory
\ No newline at end of file
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkEngineUtils.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkEngineUtils.scala
index 69fc8c69573..7d42aae8c87 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkEngineUtils.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkEngineUtils.scala
@@ -18,33 +18,43 @@
package org.apache.kyuubi.engine.flink
import java.io.File
+import java.lang.{Boolean => JBoolean}
import java.net.URL
+import java.util.{ArrayList => JArrayList, Collections => JCollections, List => JList}
import scala.collection.JavaConverters._
+import scala.collection.convert.ImplicitConversions._
-import org.apache.commons.cli.{CommandLine, DefaultParser, Option, Options, ParseException}
+import org.apache.commons.cli.{CommandLine, DefaultParser, Options}
+import org.apache.flink.api.common.JobID
+import org.apache.flink.client.cli.{CustomCommandLine, DefaultCLI, GenericCLI}
+import org.apache.flink.configuration.Configuration
import org.apache.flink.core.fs.Path
import org.apache.flink.runtime.util.EnvironmentInformation
import org.apache.flink.table.client.SqlClientException
-import org.apache.flink.table.client.cli.CliOptions
+import org.apache.flink.table.client.cli.CliOptionsParser
import org.apache.flink.table.client.cli.CliOptionsParser._
-import org.apache.flink.table.client.gateway.context.SessionContext
-import org.apache.flink.table.client.gateway.local.LocalExecutor
+import org.apache.flink.table.gateway.service.context.{DefaultContext, SessionContext}
+import org.apache.flink.table.gateway.service.result.ResultFetcher
+import org.apache.flink.table.gateway.service.session.Session
+import org.apache.flink.util.JarUtils
-import org.apache.kyuubi.Logging
-import org.apache.kyuubi.engine.SemanticVersion
+import org.apache.kyuubi.{KyuubiException, Logging}
+import org.apache.kyuubi.util.SemanticVersion
+import org.apache.kyuubi.util.reflect._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
object FlinkEngineUtils extends Logging {
- val MODE_EMBEDDED = "embedded"
- val EMBEDDED_MODE_CLIENT_OPTIONS: Options = getEmbeddedModeClientOptions(new Options);
+ val EMBEDDED_MODE_CLIENT_OPTIONS: Options = getEmbeddedModeClientOptions(new Options)
- val SUPPORTED_FLINK_VERSIONS: Array[SemanticVersion] =
- Array("1.15", "1.16").map(SemanticVersion.apply)
+ private def SUPPORTED_FLINK_VERSIONS = Set("1.16", "1.17").map(SemanticVersion.apply)
+
+ val FLINK_RUNTIME_VERSION: SemanticVersion = SemanticVersion(EnvironmentInformation.getVersion)
def checkFlinkVersion(): Unit = {
val flinkVersion = EnvironmentInformation.getVersion
- if (SUPPORTED_FLINK_VERSIONS.contains(SemanticVersion(flinkVersion))) {
+ if (SUPPORTED_FLINK_VERSIONS.contains(FLINK_RUNTIME_VERSION)) {
info(s"The current Flink version is $flinkVersion")
} else {
throw new UnsupportedOperationException(
@@ -53,56 +63,90 @@ object FlinkEngineUtils extends Logging {
}
}
- def isFlinkVersionAtMost(targetVersionString: String): Boolean =
- SemanticVersion(EnvironmentInformation.getVersion).isVersionAtMost(targetVersionString)
-
- def isFlinkVersionAtLeast(targetVersionString: String): Boolean =
- SemanticVersion(EnvironmentInformation.getVersion).isVersionAtLeast(targetVersionString)
-
- def isFlinkVersionEqualTo(targetVersionString: String): Boolean =
- SemanticVersion(EnvironmentInformation.getVersion).isVersionEqualTo(targetVersionString)
-
- def parseCliOptions(args: Array[String]): CliOptions = {
- val (mode, modeArgs) =
- if (args.isEmpty || args(0).startsWith("-")) (MODE_EMBEDDED, args)
- else (args(0), args.drop(1))
- val options = parseEmbeddedModeClient(modeArgs)
- if (mode == MODE_EMBEDDED) {
- if (options.isPrintHelp) {
- printHelpEmbeddedModeClient()
+ /**
+ * Copied and modified from [[org.apache.flink.table.client.cli.CliOptionsParser]]
+ * to avoid loading flink-python classes which we doesn't support yet.
+ */
+ private def discoverDependencies(
+ jars: JList[URL],
+ libraries: JList[URL]): JList[URL] = {
+ val dependencies: JList[URL] = new JArrayList[URL]
+ try { // find jar files
+ for (url <- jars) {
+ JarUtils.checkJarFile(url)
+ dependencies.add(url)
}
- options
- } else {
- throw new SqlClientException("Other mode is not supported yet.")
+ // find jar files in library directories
+ libraries.foreach { libUrl =>
+ val dir: File = new File(libUrl.toURI)
+ if (!dir.isDirectory) throw new SqlClientException(s"Directory expected: $dir")
+ if (!dir.canRead) throw new SqlClientException(s"Directory cannot be read: $dir")
+ val files: Array[File] = dir.listFiles
+ if (files == null) throw new SqlClientException(s"Directory cannot be read: $dir")
+ files.filter { f => f.isFile && f.getAbsolutePath.toLowerCase.endsWith(".jar") }
+ .foreach { f =>
+ val url: URL = f.toURI.toURL
+ JarUtils.checkJarFile(url)
+ dependencies.add(url)
+ }
+ }
+ } catch {
+ case e: Exception =>
+ throw new SqlClientException("Could not load all required JAR files.", e)
}
+ dependencies
}
- def getSessionContext(localExecutor: LocalExecutor, sessionId: String): SessionContext = {
- val method = classOf[LocalExecutor].getDeclaredMethod("getSessionContext", classOf[String])
- method.setAccessible(true)
- method.invoke(localExecutor, sessionId).asInstanceOf[SessionContext]
+ def getDefaultContext(
+ args: Array[String],
+ flinkConf: Configuration,
+ flinkConfDir: String): DefaultContext = {
+ val parser = new DefaultParser
+ val line = parser.parse(EMBEDDED_MODE_CLIENT_OPTIONS, args, true)
+ val jars: JList[URL] = Option(checkUrls(line, CliOptionsParser.OPTION_JAR))
+ .getOrElse(JCollections.emptyList())
+ val libDirs: JList[URL] = Option(checkUrls(line, CliOptionsParser.OPTION_LIBRARY))
+ .getOrElse(JCollections.emptyList())
+ val dependencies: JList[URL] = discoverDependencies(jars, libDirs)
+ if (FLINK_RUNTIME_VERSION === "1.16") {
+ val commandLines: JList[CustomCommandLine] =
+ Seq(new GenericCLI(flinkConf, flinkConfDir), new DefaultCLI).asJava
+ DynConstructors.builder()
+ .impl(
+ classOf[DefaultContext],
+ classOf[Configuration],
+ classOf[JList[CustomCommandLine]])
+ .build()
+ .newInstance(flinkConf, commandLines)
+ .asInstanceOf[DefaultContext]
+ } else if (FLINK_RUNTIME_VERSION === "1.17") {
+ invokeAs[DefaultContext](
+ classOf[DefaultContext],
+ "load",
+ (classOf[Configuration], flinkConf),
+ (classOf[JList[URL]], dependencies),
+ (classOf[Boolean], JBoolean.TRUE),
+ (classOf[Boolean], JBoolean.FALSE))
+ } else {
+ throw new KyuubiException(
+ s"Flink version ${EnvironmentInformation.getVersion} are not supported currently.")
+ }
}
- def parseEmbeddedModeClient(args: Array[String]): CliOptions =
+ def getSessionContext(session: Session): SessionContext = getField(session, "sessionContext")
+
+ def getResultJobId(resultFetch: ResultFetcher): Option[JobID] = {
+ if (FLINK_RUNTIME_VERSION <= "1.16") {
+ return None
+ }
try {
- val parser = new DefaultParser
- val line = parser.parse(EMBEDDED_MODE_CLIENT_OPTIONS, args, true)
- val jarUrls = checkUrls(line, OPTION_JAR)
- val libraryUrls = checkUrls(line, OPTION_LIBRARY)
- new CliOptions(
- line.hasOption(OPTION_HELP.getOpt),
- checkSessionId(line),
- checkUrl(line, OPTION_INIT_FILE),
- checkUrl(line, OPTION_FILE),
- if (jarUrls != null && jarUrls.nonEmpty) jarUrls.asJava else null,
- if (libraryUrls != null && libraryUrls.nonEmpty) libraryUrls.asJava else null,
- line.getOptionValue(OPTION_UPDATE.getOpt),
- line.getOptionValue(OPTION_HISTORY.getOpt),
- null)
+ Option(getField[JobID](resultFetch, "jobID"))
} catch {
- case e: ParseException =>
- throw new SqlClientException(e.getMessage)
+ case _: NullPointerException => None
+ case e: Throwable =>
+ throw new IllegalStateException("Unexpected error occurred while fetching query ID", e)
}
+ }
def checkSessionId(line: CommandLine): String = {
val sessionId = line.getOptionValue(OPTION_SESSION.getOpt)
@@ -111,13 +155,13 @@ object FlinkEngineUtils extends Logging {
} else sessionId
}
- def checkUrl(line: CommandLine, option: Option): URL = {
- val urls: List[URL] = checkUrls(line, option)
+ def checkUrl(line: CommandLine, option: org.apache.commons.cli.Option): URL = {
+ val urls: JList[URL] = checkUrls(line, option)
if (urls != null && urls.nonEmpty) urls.head
else null
}
- def checkUrls(line: CommandLine, option: Option): List[URL] = {
+ def checkUrls(line: CommandLine, option: org.apache.commons.cli.Option): JList[URL] = {
if (line.hasOption(option.getOpt)) {
line.getOptionValues(option.getOpt).distinct.map((url: String) => {
checkFilePath(url)
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkSQLBackendService.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkSQLBackendService.scala
index d049e3c80bf..9802f195546 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkSQLBackendService.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkSQLBackendService.scala
@@ -17,7 +17,7 @@
package org.apache.kyuubi.engine.flink
-import org.apache.flink.table.client.gateway.context.DefaultContext
+import org.apache.flink.table.gateway.service.context.DefaultContext
import org.apache.kyuubi.engine.flink.session.FlinkSQLSessionManager
import org.apache.kyuubi.service.AbstractBackendService
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkSQLEngine.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkSQLEngine.scala
index 06fdc65ae61..8838799bc24 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkSQLEngine.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/FlinkSQLEngine.scala
@@ -18,23 +18,21 @@
package org.apache.kyuubi.engine.flink
import java.io.File
-import java.net.URL
import java.nio.file.Paths
-import java.time.Instant
+import java.time.Duration
import java.util.concurrent.CountDownLatch
import scala.collection.JavaConverters._
-import scala.collection.mutable.ListBuffer
-import org.apache.flink.client.cli.{DefaultCLI, GenericCLI}
-import org.apache.flink.configuration.{Configuration, DeploymentOptions, GlobalConfiguration}
-import org.apache.flink.table.client.SqlClientException
-import org.apache.flink.table.client.gateway.context.DefaultContext
-import org.apache.flink.util.JarUtils
+import org.apache.flink.configuration.{Configuration, DeploymentOptions, GlobalConfiguration, PipelineOptions}
+import org.apache.flink.table.api.TableEnvironment
+import org.apache.flink.table.gateway.api.config.SqlGatewayServiceConfigOptions
+import org.apache.flink.table.gateway.service.context.DefaultContext
-import org.apache.kyuubi.{KyuubiSQLException, Logging, Utils}
+import org.apache.kyuubi.{Logging, Utils}
import org.apache.kyuubi.Utils.{addShutdownHook, currentUser, FLINK_ENGINE_SHUTDOWN_PRIORITY}
import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.KyuubiReservedKeys.{KYUUBI_ENGINE_NAME, KYUUBI_SESSION_USER_KEY}
import org.apache.kyuubi.engine.flink.FlinkSQLEngine.{countDownLatch, currentEngine}
import org.apache.kyuubi.service.Serverable
import org.apache.kyuubi.util.SignalRegister
@@ -71,9 +69,12 @@ object FlinkSQLEngine extends Logging {
def main(args: Array[String]): Unit = {
SignalRegister.registerLogger(logger)
+ info(s"Flink SQL engine classpath: ${System.getProperty("java.class.path")}")
+
FlinkEngineUtils.checkFlinkVersion()
try {
+ kyuubiConf.loadFileDefaults()
Utils.fromCommandLineArgs(args, kyuubiConf)
val flinkConfDir = sys.env.getOrElse(
"FLINK_CONF_DIR", {
@@ -93,51 +94,33 @@ object FlinkSQLEngine extends Logging {
flinkConf.addAll(Configuration.fromMap(flinkConfFromArgs.asJava))
val executionTarget = flinkConf.getString(DeploymentOptions.TARGET)
- // set cluster name for per-job and application mode
- executionTarget match {
- case "yarn-per-job" | "yarn-application" =>
- if (!flinkConf.containsKey("yarn.application.name")) {
- val appName = s"kyuubi_${user}_flink_${Instant.now}"
- flinkConf.setString("yarn.application.name", appName)
- }
- case "kubernetes-application" =>
- if (!flinkConf.containsKey("kubernetes.cluster-id")) {
- val appName = s"kyuubi-${user}-flink-${Instant.now}"
- flinkConf.setString("kubernetes.cluster-id", appName)
- }
- case other =>
- debug(s"Skip generating app name for execution target $other")
- }
-
- val cliOptions = FlinkEngineUtils.parseCliOptions(args)
- val jars = if (cliOptions.getJars != null) cliOptions.getJars.asScala else List.empty
- val libDirs =
- if (cliOptions.getLibraryDirs != null) cliOptions.getLibraryDirs.asScala else List.empty
- val dependencies = discoverDependencies(jars, libDirs)
- val engineContext = new DefaultContext(
- dependencies.asJava,
- flinkConf,
- Seq(new GenericCLI(flinkConf, flinkConfDir), new DefaultCLI).asJava)
+ setDeploymentConf(executionTarget, flinkConf)
kyuubiConf.setIfMissing(KyuubiConf.FRONTEND_THRIFT_BINARY_BIND_PORT, 0)
+ val engineContext = FlinkEngineUtils.getDefaultContext(args, flinkConf, flinkConfDir)
startEngine(engineContext)
- info("started engine...")
+ info("Flink engine started")
+
+ if ("yarn-application".equalsIgnoreCase(executionTarget)) {
+ bootstrapFlinkApplicationExecutor()
+ }
// blocking main thread
countDownLatch.await()
} catch {
case t: Throwable if currentEngine.isDefined =>
+ error("Fatal error occurs, thus stopping the engines", t)
currentEngine.foreach { engine =>
- error(t)
engine.stop()
}
case t: Throwable =>
- error("Create FlinkSQL Engine Failed", t)
+ error("Failed to create FlinkSQL Engine", t)
}
}
def startEngine(engineContext: DefaultContext): Unit = {
+ debug(s"Starting Flink SQL engine with default configuration: ${engineContext.getFlinkConfig}")
currentEngine = Some(new FlinkSQLEngine(engineContext))
currentEngine.foreach { engine =>
engine.initialize(kyuubiConf)
@@ -146,36 +129,39 @@ object FlinkSQLEngine extends Logging {
}
}
- private def discoverDependencies(
- jars: Seq[URL],
- libraries: Seq[URL]): List[URL] = {
- try {
- var dependencies: ListBuffer[URL] = ListBuffer()
- // find jar files
- jars.foreach { url =>
- JarUtils.checkJarFile(url)
- dependencies = dependencies += url
- }
- // find jar files in library directories
- libraries.foreach { libUrl =>
- val dir: File = new File(libUrl.toURI)
- if (!dir.isDirectory) throw new SqlClientException("Directory expected: " + dir)
- else if (!dir.canRead) throw new SqlClientException("Directory cannot be read: " + dir)
- val files: Array[File] = dir.listFiles
- if (files == null) throw new SqlClientException("Directory cannot be read: " + dir)
- files.foreach { f =>
- // only consider jars
- if (f.isFile && f.getAbsolutePath.toLowerCase.endsWith(".jar")) {
- val url: URL = f.toURI.toURL
- JarUtils.checkJarFile(url)
- dependencies = dependencies += url
- }
+ private def bootstrapFlinkApplicationExecutor() = {
+ // trigger an execution to initiate EmbeddedExecutor with the default flink conf
+ val flinkConf = new Configuration()
+ flinkConf.set(PipelineOptions.NAME, "kyuubi-bootstrap-sql")
+ debug(s"Running bootstrap Flink SQL in application mode with flink conf: $flinkConf.")
+ val tableEnv = TableEnvironment.create(flinkConf)
+ val res = tableEnv.executeSql("select 'kyuubi'")
+ res.await()
+ info("Bootstrap Flink SQL finished.")
+ }
+
+ private def setDeploymentConf(executionTarget: String, flinkConf: Configuration): Unit = {
+ // forward kyuubi engine variables to flink configuration
+ kyuubiConf.getOption("flink.app.name")
+ .foreach(flinkConf.setString(KYUUBI_ENGINE_NAME, _))
+
+ kyuubiConf.getOption(KYUUBI_SESSION_USER_KEY)
+ .foreach(flinkConf.setString(KYUUBI_SESSION_USER_KEY, _))
+
+ // force disable Flink's session timeout
+ flinkConf.set(
+ SqlGatewayServiceConfigOptions.SQL_GATEWAY_SESSION_IDLE_TIMEOUT,
+ Duration.ofMillis(0))
+
+ executionTarget match {
+ case "yarn-per-job" | "yarn-application" =>
+ if (flinkConf.containsKey("high-availability.cluster-id")) {
+ flinkConf.setString(
+ "yarn.application.id",
+ flinkConf.toMap.get("high-availability.cluster-id"))
}
- }
- dependencies.toList
- } catch {
- case e: Exception =>
- throw KyuubiSQLException(s"Could not load all required JAR files.", e)
+ case other =>
+ debug(s"Skip setting deployment conf for execution target $other")
}
}
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala
index 0438b98d1ad..0e0c476e2d4 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala
@@ -17,39 +17,25 @@
package org.apache.kyuubi.engine.flink.operation
-import java.time.{LocalDate, LocalTime}
-import java.util
-
-import scala.collection.JavaConverters._
-import scala.collection.mutable.ArrayBuffer
+import scala.concurrent.duration.Duration
import org.apache.flink.api.common.JobID
-import org.apache.flink.table.api.ResultKind
-import org.apache.flink.table.client.gateway.TypedResult
-import org.apache.flink.table.data.{GenericArrayData, GenericMapData, RowData}
-import org.apache.flink.table.data.binary.{BinaryArrayData, BinaryMapData}
-import org.apache.flink.table.operations.{Operation, QueryOperation}
-import org.apache.flink.table.operations.command._
-import org.apache.flink.table.types.DataType
-import org.apache.flink.table.types.logical._
-import org.apache.flink.types.Row
+import org.apache.flink.table.gateway.api.operation.OperationHandle
import org.apache.kyuubi.Logging
-import org.apache.kyuubi.engine.flink.FlinkEngineUtils._
-import org.apache.kyuubi.engine.flink.result.ResultSet
-import org.apache.kyuubi.engine.flink.schema.RowSet.toHiveString
+import org.apache.kyuubi.engine.flink.FlinkEngineUtils
+import org.apache.kyuubi.engine.flink.result.ResultSetUtil
import org.apache.kyuubi.operation.OperationState
import org.apache.kyuubi.operation.log.OperationLog
-import org.apache.kyuubi.reflection.DynMethods
import org.apache.kyuubi.session.Session
-import org.apache.kyuubi.util.RowSetUtils
class ExecuteStatement(
session: Session,
override val statement: String,
override val shouldRunAsync: Boolean,
queryTimeout: Long,
- resultMaxRows: Int)
+ resultMaxRows: Int,
+ resultFetchTimeout: Duration)
extends FlinkOperation(session) with Logging {
private val operationLog: OperationLog =
@@ -65,10 +51,6 @@ class ExecuteStatement(
setHasResultSet(true)
}
- override protected def afterRun(): Unit = {
- OperationLog.removeCurrentOperationLog()
- }
-
override protected def runInternal(): Unit = {
addTimeoutMonitor(queryTimeout)
executeStatement()
@@ -77,21 +59,11 @@ class ExecuteStatement(
private def executeStatement(): Unit = {
try {
setState(OperationState.RUNNING)
- val operation = executor.parseStatement(sessionId, statement)
- operation match {
- case queryOperation: QueryOperation => runQueryOperation(queryOperation)
- case setOperation: SetOperation =>
- resultSet = OperationUtils.runSetOperation(setOperation, executor, sessionId)
- case resetOperation: ResetOperation =>
- resultSet = OperationUtils.runResetOperation(resetOperation, executor, sessionId)
- case addJarOperation: AddJarOperation if isFlinkVersionAtMost("1.15") =>
- resultSet = OperationUtils.runAddJarOperation(addJarOperation, executor, sessionId)
- case removeJarOperation: RemoveJarOperation =>
- resultSet = OperationUtils.runRemoveJarOperation(removeJarOperation, executor, sessionId)
- case showJarsOperation: ShowJarsOperation if isFlinkVersionAtMost("1.15") =>
- resultSet = OperationUtils.runShowJarOperation(showJarsOperation, executor, sessionId)
- case operation: Operation => runOperation(operation)
- }
+ val resultFetcher = executor.executeStatement(
+ new OperationHandle(getHandle.identifier),
+ statement)
+ jobId = FlinkEngineUtils.getResultJobId(resultFetcher)
+ resultSet = ResultSetUtil.fromResultFetcher(resultFetcher, resultMaxRows, resultFetchTimeout)
setState(OperationState.FINISHED)
} catch {
onError(cancel = true)
@@ -99,157 +71,4 @@ class ExecuteStatement(
shutdownTimeoutMonitor()
}
}
-
- private def runQueryOperation(operation: QueryOperation): Unit = {
- var resultId: String = null
- try {
- val resultDescriptor = executor.executeQuery(sessionId, operation)
- val dataTypes = resultDescriptor.getResultSchema.getColumnDataTypes.asScala.toList
-
- resultId = resultDescriptor.getResultId
-
- val rows = new ArrayBuffer[Row]()
- var loop = true
-
- while (loop) {
- Thread.sleep(50) // slow the processing down
-
- val pageSize = Math.min(500, resultMaxRows)
- val result = executor.snapshotResult(sessionId, resultId, pageSize)
- result.getType match {
- case TypedResult.ResultType.PAYLOAD =>
- (1 to result.getPayload).foreach { page =>
- if (rows.size < resultMaxRows) {
- val result = executor.retrieveResultPage(resultId, page)
- rows ++= result.asScala.map(r => convertToRow(r, dataTypes))
- } else {
- loop = false
- }
- }
- case TypedResult.ResultType.EOS => loop = false
- case TypedResult.ResultType.EMPTY =>
- }
- }
-
- resultSet = ResultSet.builder
- .resultKind(ResultKind.SUCCESS_WITH_CONTENT)
- .columns(resultDescriptor.getResultSchema.getColumns)
- .data(rows.slice(0, resultMaxRows).toArray[Row])
- .build
- } finally {
- if (resultId != null) {
- cleanupQueryResult(resultId)
- }
- }
- }
-
- private def runOperation(operation: Operation): Unit = {
- val result = executor.executeOperation(sessionId, operation)
- jobId = result.getJobClient.asScala.map(_.getJobID)
- // after FLINK-24461, TableResult#await() would block insert statements
- // until the job finishes, instead of returning row affected immediately
- resultSet = ResultSet.fromTableResult(result)
- }
-
- private def cleanupQueryResult(resultId: String): Unit = {
- try {
- executor.cancelQuery(sessionId, resultId)
- } catch {
- case t: Throwable =>
- warn(s"Failed to clean result set $resultId in session $sessionId", t)
- }
- }
-
- private[this] def convertToRow(r: RowData, dataTypes: List[DataType]): Row = {
- val row = Row.withPositions(r.getRowKind, r.getArity)
- for (i <- 0 until r.getArity) {
- val dataType = dataTypes(i)
- dataType.getLogicalType match {
- case arrayType: ArrayType =>
- val arrayData = r.getArray(i)
- if (arrayData == null) {
- row.setField(i, null)
- }
- arrayData match {
- case d: GenericArrayData =>
- row.setField(i, d.toObjectArray)
- case d: BinaryArrayData =>
- row.setField(i, d.toObjectArray(arrayType.getElementType))
- case _ =>
- }
- case _: BinaryType =>
- row.setField(i, r.getBinary(i))
- case _: BigIntType =>
- row.setField(i, r.getLong(i))
- case _: BooleanType =>
- row.setField(i, r.getBoolean(i))
- case _: VarCharType | _: CharType =>
- row.setField(i, r.getString(i))
- case t: DecimalType =>
- row.setField(i, r.getDecimal(i, t.getPrecision, t.getScale).toBigDecimal)
- case _: DateType =>
- val date = RowSetUtils.formatLocalDate(LocalDate.ofEpochDay(r.getInt(i)))
- row.setField(i, date)
- case _: TimeType =>
- val time = RowSetUtils.formatLocalTime(LocalTime.ofNanoOfDay(r.getLong(i) * 1000 * 1000))
- row.setField(i, time)
- case t: TimestampType =>
- val ts = RowSetUtils
- .formatLocalDateTime(r.getTimestamp(i, t.getPrecision)
- .toLocalDateTime)
- row.setField(i, ts)
- case _: TinyIntType =>
- row.setField(i, r.getByte(i))
- case _: SmallIntType =>
- row.setField(i, r.getShort(i))
- case _: IntType =>
- row.setField(i, r.getInt(i))
- case _: FloatType =>
- row.setField(i, r.getFloat(i))
- case mapType: MapType =>
- val mapData = r.getMap(i)
- if (mapData != null && mapData.size > 0) {
- val keyType = mapType.getKeyType
- val valueType = mapType.getValueType
- mapData match {
- case d: BinaryMapData =>
- val kvArray = toArray(keyType, valueType, d)
- val map: util.Map[Any, Any] = new util.HashMap[Any, Any]
- for (i <- kvArray._1.indices) {
- val value: Any = kvArray._2(i)
- map.put(kvArray._1(i), value)
- }
- row.setField(i, map)
- case d: GenericMapData => // TODO
- }
- } else {
- row.setField(i, null)
- }
- case _: DoubleType =>
- row.setField(i, r.getDouble(i))
- case t: RowType =>
- val fieldDataTypes = DynMethods.builder("getFieldDataTypes")
- .impl(classOf[DataType], classOf[DataType])
- .buildStatic
- .invoke[util.List[DataType]](dataType)
- .asScala.toList
- val internalRowData = r.getRow(i, t.getFieldCount)
- val internalRow = convertToRow(internalRowData, fieldDataTypes)
- row.setField(i, internalRow)
- case t =>
- val hiveString = toHiveString((row.getField(i), t))
- row.setField(i, hiveString)
- }
- }
- row
- }
-
- private[this] def toArray(
- keyType: LogicalType,
- valueType: LogicalType,
- arrayData: BinaryMapData): (Array[_], Array[_]) = {
-
- arrayData.keyArray().toObjectArray(keyType) -> arrayData.valueArray().toObjectArray(valueType)
- }
-
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperation.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperation.scala
index 2859d659e62..1424b721c4b 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperation.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperation.scala
@@ -18,12 +18,17 @@
package org.apache.kyuubi.engine.flink.operation
import java.io.IOException
+import java.time.ZoneId
+import java.util.concurrent.TimeoutException
import scala.collection.JavaConverters.collectionAsScalaIterableConverter
+import scala.collection.mutable.ListBuffer
-import org.apache.flink.table.client.gateway.Executor
-import org.apache.flink.table.client.gateway.context.SessionContext
-import org.apache.hive.service.rpc.thrift.{TGetResultSetMetadataResp, TRowSet, TTableSchema}
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.table.gateway.service.context.SessionContext
+import org.apache.flink.table.gateway.service.operation.OperationExecutor
+import org.apache.flink.types.Row
+import org.apache.hive.service.rpc.thrift.{TFetchResultsResp, TGetResultSetMetadataResp, TTableSchema}
import org.apache.kyuubi.{KyuubiSQLException, Utils}
import org.apache.kyuubi.engine.flink.result.ResultSet
@@ -36,12 +41,16 @@ import org.apache.kyuubi.session.Session
abstract class FlinkOperation(session: Session) extends AbstractOperation(session) {
+ protected val flinkSession: org.apache.flink.table.gateway.service.session.Session =
+ session.asInstanceOf[FlinkSessionImpl].fSession
+
+ protected val executor: OperationExecutor = flinkSession.createExecutor(
+ Configuration.fromMap(flinkSession.getSessionConfig))
+
protected val sessionContext: SessionContext = {
session.asInstanceOf[FlinkSessionImpl].sessionContext
}
- protected val executor: Executor = session.asInstanceOf[FlinkSessionImpl].executor
-
protected val sessionId: String = session.handle.identifier.toString
protected var resultSet: ResultSet = _
@@ -52,7 +61,7 @@ abstract class FlinkOperation(session: Session) extends AbstractOperation(sessio
}
override protected def afterRun(): Unit = {
- state.synchronized {
+ withLockRequired {
if (!isTerminalState(state)) {
setState(OperationState.FINISHED)
}
@@ -66,6 +75,10 @@ abstract class FlinkOperation(session: Session) extends AbstractOperation(sessio
override def close(): Unit = {
cleanup(OperationState.CLOSED)
+ // the result set may be null if the operation ends exceptionally
+ if (resultSet != null) {
+ resultSet.close
+ }
try {
getOperationLog.foreach(_.close())
} catch {
@@ -85,22 +98,50 @@ abstract class FlinkOperation(session: Session) extends AbstractOperation(sessio
resp
}
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
setHasResultSet(true)
order match {
- case FETCH_NEXT => resultSet.getData.fetchNext()
case FETCH_PRIOR => resultSet.getData.fetchPrior(rowSetSize);
case FETCH_FIRST => resultSet.getData.fetchAbsolute(0);
+ case FETCH_NEXT => // ignored because new data are fetched lazily
+ }
+ val batch = new ListBuffer[Row]
+ try {
+ // there could be null values at the end of the batch
+ // because Flink could return an EOS
+ var rows = 0
+ while (resultSet.getData.hasNext && rows < rowSetSize) {
+ Option(resultSet.getData.next()).foreach { r => batch += r; rows += 1 }
+ }
+ } catch {
+ case e: TimeoutException =>
+ // ignore and return the current batch if there's some data
+ // otherwise, rethrow the timeout exception
+ if (batch.nonEmpty) {
+ debug(s"Timeout fetching more data for $opType operation. " +
+ s"Returning the current fetched data.")
+ } else {
+ throw e
+ }
+ }
+ val timeZone = Option(flinkSession.getSessionConfig.get("table.local-time-zone"))
+ val zoneId = timeZone match {
+ case Some(tz) => ZoneId.of(tz)
+ case None => ZoneId.systemDefault()
}
- val token = resultSet.getData.take(rowSetSize)
val resultRowSet = RowSet.resultSetToTRowSet(
- token.toList,
+ batch.toList,
resultSet,
+ zoneId,
getProtocolVersion)
- resultRowSet.setStartRowOffset(resultSet.getData.getPosition)
- resultRowSet
+ val resp = new TFetchResultsResp(OK_STATUS)
+ resp.setResults(resultRowSet)
+ resp.setHasMoreRows(resultSet.getData.hasNext)
+ resp
}
override def shouldRunAsync: Boolean = false
@@ -109,7 +150,7 @@ abstract class FlinkOperation(session: Session) extends AbstractOperation(sessio
// We should use Throwable instead of Exception since `java.lang.NoClassDefFoundError`
// could be thrown.
case e: Throwable =>
- state.synchronized {
+ withLockRequired {
val errMsg = Utils.stringifyException(e)
if (state == OperationState.TIMEOUT) {
val ke = KyuubiSQLException(s"Timeout operating $opType: $errMsg")
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/FlinkSQLOperationManager.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/FlinkSQLOperationManager.scala
index d7b5e297d1a..d5c0629eedd 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/FlinkSQLOperationManager.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/FlinkSQLOperationManager.scala
@@ -20,9 +20,12 @@ package org.apache.kyuubi.engine.flink.operation
import java.util
import scala.collection.JavaConverters._
+import scala.concurrent.duration.{Duration, DurationLong}
+import scala.language.postfixOps
import org.apache.kyuubi.KyuubiSQLException
import org.apache.kyuubi.config.KyuubiConf._
+import org.apache.kyuubi.engine.flink.FlinkEngineUtils
import org.apache.kyuubi.engine.flink.result.Constants
import org.apache.kyuubi.engine.flink.session.FlinkSessionImpl
import org.apache.kyuubi.operation.{NoneMode, Operation, OperationManager, PlanOnlyMode}
@@ -44,7 +47,8 @@ class FlinkSQLOperationManager extends OperationManager("FlinkSQLOperationManage
runAsync: Boolean,
queryTimeout: Long): Operation = {
val flinkSession = session.asInstanceOf[FlinkSessionImpl]
- if (flinkSession.sessionContext.getConfigMap.getOrDefault(
+ val sessionConfig = flinkSession.fSession.getSessionConfig
+ if (sessionConfig.getOrDefault(
ENGINE_OPERATION_CONVERT_CATALOG_DATABASE_ENABLED.key,
operationConvertCatalogDatabaseDefault.toString).toBoolean) {
val catalogDatabaseOperation = processCatalogDatabase(session, statement, confOverlay)
@@ -53,23 +57,42 @@ class FlinkSQLOperationManager extends OperationManager("FlinkSQLOperationManage
}
}
- val mode = PlanOnlyMode.fromString(flinkSession.sessionContext.getConfigMap.getOrDefault(
- OPERATION_PLAN_ONLY_MODE.key,
- operationModeDefault))
+ val mode = PlanOnlyMode.fromString(
+ sessionConfig.getOrDefault(
+ OPERATION_PLAN_ONLY_MODE.key,
+ operationModeDefault))
- flinkSession.sessionContext.set(OPERATION_PLAN_ONLY_MODE.key, mode.name)
+ val sessionContext = FlinkEngineUtils.getSessionContext(flinkSession.fSession)
+ sessionContext.set(OPERATION_PLAN_ONLY_MODE.key, mode.name)
val resultMaxRows =
flinkSession.normalizedConf.getOrElse(
ENGINE_FLINK_MAX_ROWS.key,
resultMaxRowsDefault.toString).toInt
+
+ val resultFetchTimeout =
+ flinkSession.normalizedConf.get(ENGINE_FLINK_FETCH_TIMEOUT.key).map(_.toLong milliseconds)
+ .getOrElse(Duration.Inf)
+
val op = mode match {
case NoneMode =>
// FLINK-24427 seals calcite classes which required to access in async mode, considering
// there is no much benefit in async mode, here we just ignore `runAsync` and always run
// statement in sync mode as a workaround
- new ExecuteStatement(session, statement, false, queryTimeout, resultMaxRows)
+ new ExecuteStatement(
+ session,
+ statement,
+ false,
+ queryTimeout,
+ resultMaxRows,
+ resultFetchTimeout)
case mode =>
- new PlanOnlyStatement(session, statement, mode)
+ new PlanOnlyStatement(
+ session,
+ statement,
+ mode,
+ queryTimeout,
+ resultMaxRows,
+ resultFetchTimeout)
}
addOperation(op)
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCatalogs.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCatalogs.scala
index 11dd760e4ec..2453716812d 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCatalogs.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCatalogs.scala
@@ -17,6 +17,8 @@
package org.apache.kyuubi.engine.flink.operation
+import scala.collection.convert.ImplicitConversions._
+
import org.apache.kyuubi.engine.flink.result.ResultSetUtil
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant.TABLE_CAT
import org.apache.kyuubi.session.Session
@@ -25,8 +27,8 @@ class GetCatalogs(session: Session) extends FlinkOperation(session) {
override protected def runInternal(): Unit = {
try {
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
- val catalogs = tableEnv.listCatalogs.toList
+ val catalogManager = sessionContext.getSessionState.catalogManager
+ val catalogs = catalogManager.listCatalogs.toList
resultSet = ResultSetUtil.stringListToResultSet(catalogs, TABLE_CAT)
} catch onError()
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetColumns.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetColumns.scala
index 6ce2a6ac7e7..b1a7c0c3ee5 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetColumns.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetColumns.scala
@@ -21,7 +21,7 @@ import scala.collection.JavaConverters._
import org.apache.commons.lang3.StringUtils
import org.apache.flink.table.api.{DataTypes, ResultKind}
-import org.apache.flink.table.catalog.Column
+import org.apache.flink.table.catalog.{Column, ObjectIdentifier}
import org.apache.flink.table.types.logical._
import org.apache.flink.types.Row
@@ -40,17 +40,17 @@ class GetColumns(
override protected def runInternal(): Unit = {
try {
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
val catalogName =
- if (StringUtils.isEmpty(catalogNameOrEmpty)) tableEnv.getCurrentCatalog
+ if (StringUtils.isEmpty(catalogNameOrEmpty)) executor.getCurrentCatalog
else catalogNameOrEmpty
val schemaNameRegex = toJavaRegex(schemaNamePattern)
val tableNameRegex = toJavaRegex(tableNamePattern)
val columnNameRegex = toJavaRegex(columnNamePattern).r
- val columns = tableEnv.getCatalog(catalogName).asScala.toArray.flatMap { flinkCatalog =>
+ val catalogManager = sessionContext.getSessionState.catalogManager
+ val columns = catalogManager.getCatalog(catalogName).asScala.toArray.flatMap { flinkCatalog =>
SchemaHelper.getSchemasWithPattern(flinkCatalog, schemaNameRegex)
.flatMap { schemaName =>
SchemaHelper.getFlinkTablesWithPattern(
@@ -60,7 +60,8 @@ class GetColumns(
tableNameRegex)
.filter { _._2.isDefined }
.flatMap { case (tableName, _) =>
- val flinkTable = tableEnv.from(s"`$catalogName`.`$schemaName`.`$tableName`")
+ val flinkTable = catalogManager.getTable(
+ ObjectIdentifier.of(catalogName, schemaName, tableName)).get()
val resolvedSchema = flinkTable.getResolvedSchema
resolvedSchema.getColumns.asScala.toArray.zipWithIndex
.filter { case (column, _) =>
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCurrentCatalog.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCurrentCatalog.scala
index 988072e8da4..5f82de4a689 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCurrentCatalog.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCurrentCatalog.scala
@@ -18,15 +18,20 @@
package org.apache.kyuubi.engine.flink.operation
import org.apache.kyuubi.engine.flink.result.ResultSetUtil
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant.TABLE_CAT
import org.apache.kyuubi.session.Session
class GetCurrentCatalog(session: Session) extends FlinkOperation(session) {
+ private val operationLog: OperationLog =
+ OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def runInternal(): Unit = {
try {
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
- val catalog = tableEnv.getCurrentCatalog
+ val catalog = executor.getCurrentCatalog
resultSet = ResultSetUtil.stringListToResultSet(List(catalog), TABLE_CAT)
} catch onError()
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCurrentDatabase.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCurrentDatabase.scala
index 8315a18d3d8..107609c0639 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCurrentDatabase.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetCurrentDatabase.scala
@@ -18,15 +18,20 @@
package org.apache.kyuubi.engine.flink.operation
import org.apache.kyuubi.engine.flink.result.ResultSetUtil
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant.TABLE_SCHEM
import org.apache.kyuubi.session.Session
class GetCurrentDatabase(session: Session) extends FlinkOperation(session) {
+ private val operationLog: OperationLog =
+ OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def runInternal(): Unit = {
try {
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
- val database = tableEnv.getCurrentDatabase
+ val database = sessionContext.getSessionState.catalogManager.getCurrentDatabase
resultSet = ResultSetUtil.stringListToResultSet(List(database), TABLE_SCHEM)
} catch onError()
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetFunctions.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetFunctions.scala
index ab870ab7931..85f34a29a05 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetFunctions.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetFunctions.scala
@@ -20,9 +20,10 @@ package org.apache.kyuubi.engine.flink.operation
import java.sql.DatabaseMetaData
import scala.collection.JavaConverters._
+import scala.collection.convert.ImplicitConversions._
import org.apache.commons.lang3.StringUtils
-import org.apache.flink.table.api.{DataTypes, ResultKind, TableEnvironment}
+import org.apache.flink.table.api.{DataTypes, ResultKind}
import org.apache.flink.table.catalog.Column
import org.apache.flink.types.Row
@@ -42,17 +43,20 @@ class GetFunctions(
try {
val schemaPattern = toJavaRegex(schemaName)
val functionPattern = toJavaRegex(functionName)
- val tableEnv: TableEnvironment = sessionContext.getExecutionContext.getTableEnvironment
+ val functionCatalog = sessionContext.getSessionState.functionCatalog
+ val catalogManager = sessionContext.getSessionState.catalogManager
+
val systemFunctions = filterPattern(
- tableEnv.listFunctions().diff(tableEnv.listUserDefinedFunctions()),
+ functionCatalog.getFunctions
+ .diff(functionCatalog.getUserDefinedFunctions),
functionPattern)
.map { f =>
Row.of(null, null, f, null, Integer.valueOf(DatabaseMetaData.functionResultUnknown), null)
- }
- val catalogFunctions = tableEnv.listCatalogs()
+ }.toArray
+ val catalogFunctions = catalogManager.listCatalogs()
.filter { c => StringUtils.isEmpty(catalogName) || c == catalogName }
.flatMap { c =>
- val catalog = tableEnv.getCatalog(c).get()
+ val catalog = catalogManager.getCatalog(c).get()
filterPattern(catalog.listDatabases().asScala, schemaPattern)
.flatMap { d =>
filterPattern(catalog.listFunctions(d).asScala, functionPattern)
@@ -66,7 +70,7 @@ class GetFunctions(
null)
}
}
- }
+ }.toArray
resultSet = ResultSet.builder.resultKind(ResultKind.SUCCESS_WITH_CONTENT)
.columns(
Column.physical(FUNCTION_CAT, DataTypes.STRING()),
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetPrimaryKeys.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetPrimaryKeys.scala
index b534feb1fd9..5b9060cf184 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetPrimaryKeys.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetPrimaryKeys.scala
@@ -21,8 +21,9 @@ import scala.collection.JavaConverters._
import org.apache.commons.lang3.StringUtils
import org.apache.flink.table.api.{DataTypes, ResultKind}
-import org.apache.flink.table.catalog.Column
+import org.apache.flink.table.catalog.{Column, ObjectIdentifier}
import org.apache.flink.types.Row
+import org.apache.flink.util.FlinkException
import org.apache.kyuubi.engine.flink.result.ResultSet
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant._
@@ -37,22 +38,25 @@ class GetPrimaryKeys(
override protected def runInternal(): Unit = {
try {
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
+ val catalogManager = sessionContext.getSessionState.catalogManager
val catalogName =
- if (StringUtils.isEmpty(catalogNameOrEmpty)) tableEnv.getCurrentCatalog
+ if (StringUtils.isEmpty(catalogNameOrEmpty)) catalogManager.getCurrentCatalog
else catalogNameOrEmpty
val schemaName =
if (StringUtils.isEmpty(schemaNameOrEmpty)) {
- if (catalogName != tableEnv.getCurrentCatalog) {
- tableEnv.getCatalog(catalogName).get().getDefaultDatabase
+ if (catalogName != executor.getCurrentCatalog) {
+ catalogManager.getCatalog(catalogName).get().getDefaultDatabase
} else {
- tableEnv.getCurrentDatabase
+ catalogManager.getCurrentDatabase
}
} else schemaNameOrEmpty
- val flinkTable = tableEnv.from(s"`$catalogName`.`$schemaName`.`$tableName`")
+ val flinkTable = catalogManager
+ .getTable(ObjectIdentifier.of(catalogName, schemaName, tableName))
+ .orElseThrow(() =>
+ new FlinkException(s"Table `$catalogName`.`$schemaName`.`$tableName`` not found."))
val resolvedSchema = flinkTable.getResolvedSchema
val primaryKeySchema = resolvedSchema.getPrimaryKey
@@ -102,5 +106,4 @@ class GetPrimaryKeys(
)
// format: on
}
-
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetSchemas.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetSchemas.scala
index 6715b232073..f56ddd8b18e 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetSchemas.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetSchemas.scala
@@ -18,9 +18,10 @@
package org.apache.kyuubi.engine.flink.operation
import scala.collection.JavaConverters._
+import scala.collection.convert.ImplicitConversions._
import org.apache.commons.lang3.StringUtils
-import org.apache.flink.table.api.{DataTypes, ResultKind, TableEnvironment}
+import org.apache.flink.table.api.{DataTypes, ResultKind}
import org.apache.flink.table.catalog.Column
import org.apache.flink.types.Row
@@ -35,14 +36,14 @@ class GetSchemas(session: Session, catalogName: String, schema: String)
override protected def runInternal(): Unit = {
try {
val schemaPattern = toJavaRegex(schema)
- val tableEnv: TableEnvironment = sessionContext.getExecutionContext.getTableEnvironment
- val schemas = tableEnv.listCatalogs()
+ val catalogManager = sessionContext.getSessionState.catalogManager
+ val schemas = catalogManager.listCatalogs()
.filter { c => StringUtils.isEmpty(catalogName) || c == catalogName }
.flatMap { c =>
- val catalog = tableEnv.getCatalog(c).get()
+ val catalog = catalogManager.getCatalog(c).get()
filterPattern(catalog.listDatabases().asScala, schemaPattern)
.map { d => Row.of(d, c) }
- }
+ }.toArray
resultSet = ResultSet.builder.resultKind(ResultKind.SUCCESS_WITH_CONTENT)
.columns(
Column.physical(TABLE_SCHEM, DataTypes.STRING()),
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetTables.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetTables.scala
index a4e55715a5a..325a501671e 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetTables.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/GetTables.scala
@@ -37,16 +37,16 @@ class GetTables(
override protected def runInternal(): Unit = {
try {
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
+ val catalogManager = sessionContext.getSessionState.catalogManager
val catalogName =
- if (StringUtils.isEmpty(catalogNameOrEmpty)) tableEnv.getCurrentCatalog
+ if (StringUtils.isEmpty(catalogNameOrEmpty)) catalogManager.getCurrentCatalog
else catalogNameOrEmpty
val schemaNameRegex = toJavaRegex(schemaNamePattern)
val tableNameRegex = toJavaRegex(tableNamePattern)
- val tables = tableEnv.getCatalog(catalogName).asScala.toArray.flatMap { flinkCatalog =>
+ val tables = catalogManager.getCatalog(catalogName).asScala.toArray.flatMap { flinkCatalog =>
SchemaHelper.getSchemasWithPattern(flinkCatalog, schemaNameRegex)
.flatMap { schemaName =>
SchemaHelper.getFlinkTablesWithPattern(
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/OperationUtils.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/OperationUtils.scala
deleted file mode 100644
index 7d624948c18..00000000000
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/OperationUtils.scala
+++ /dev/null
@@ -1,172 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.engine.flink.operation
-
-import java.util
-
-import scala.collection.JavaConverters._
-import scala.collection.mutable.ArrayBuffer
-
-import org.apache.flink.table.api.{DataTypes, ResultKind}
-import org.apache.flink.table.catalog.Column
-import org.apache.flink.table.client.gateway.Executor
-import org.apache.flink.table.operations.command._
-import org.apache.flink.types.Row
-
-import org.apache.kyuubi.engine.flink.result.{ResultSet, ResultSetUtil}
-import org.apache.kyuubi.engine.flink.result.ResultSetUtil.successResultSet
-import org.apache.kyuubi.reflection.DynMethods
-
-object OperationUtils {
-
- /**
- * Runs a SetOperation with executor. Returns when SetOperation is executed successfully.
- *
- * @param setOperation Set operation.
- * @param executor A gateway for communicating with Flink and other external systems.
- * @param sessionId Id of the session.
- * @return A ResultSet of SetOperation execution.
- */
- def runSetOperation(
- setOperation: SetOperation,
- executor: Executor,
- sessionId: String): ResultSet = {
- if (setOperation.getKey.isPresent) {
- val key: String = setOperation.getKey.get.trim
-
- if (setOperation.getValue.isPresent) {
- val newValue: String = setOperation.getValue.get.trim
- executor.setSessionProperty(sessionId, key, newValue)
- }
-
- val value = executor.getSessionConfigMap(sessionId).getOrDefault(key, "")
- ResultSet.builder
- .resultKind(ResultKind.SUCCESS_WITH_CONTENT)
- .columns(
- Column.physical("key", DataTypes.STRING()),
- Column.physical("value", DataTypes.STRING()))
- .data(Array(Row.of(key, value)))
- .build
- } else {
- // show all properties if set without key
- val properties: util.Map[String, String] = executor.getSessionConfigMap(sessionId)
-
- val entries = ArrayBuffer.empty[Row]
- properties.forEach((key, value) => entries.append(Row.of(key, value)))
-
- if (entries.nonEmpty) {
- val prettyEntries = entries.sortBy(_.getField(0).asInstanceOf[String])
- ResultSet.builder
- .resultKind(ResultKind.SUCCESS_WITH_CONTENT)
- .columns(
- Column.physical("key", DataTypes.STRING()),
- Column.physical("value", DataTypes.STRING()))
- .data(prettyEntries.toArray)
- .build
- } else {
- ResultSet.builder
- .resultKind(ResultKind.SUCCESS_WITH_CONTENT)
- .columns(
- Column.physical("key", DataTypes.STRING()),
- Column.physical("value", DataTypes.STRING()))
- .data(Array[Row]())
- .build
- }
- }
- }
-
- /**
- * Runs a ResetOperation with executor. Returns when ResetOperation is executed successfully.
- *
- * @param resetOperation Reset operation.
- * @param executor A gateway for communicating with Flink and other external systems.
- * @param sessionId Id of the session.
- * @return A ResultSet of ResetOperation execution.
- */
- def runResetOperation(
- resetOperation: ResetOperation,
- executor: Executor,
- sessionId: String): ResultSet = {
- if (resetOperation.getKey.isPresent) {
- // reset the given property
- executor.resetSessionProperty(sessionId, resetOperation.getKey.get())
- } else {
- // reset all properties
- executor.resetSessionProperties(sessionId)
- }
- successResultSet
- }
-
- /**
- * Runs a AddJarOperation with the executor. Currently only jars on local filesystem
- * are supported.
- *
- * @param addJarOperation Add-jar operation.
- * @param executor A gateway for communicating with Flink and other external systems.
- * @param sessionId Id of the session.
- * @return A ResultSet of AddJarOperation execution.
- */
- def runAddJarOperation(
- addJarOperation: AddJarOperation,
- executor: Executor,
- sessionId: String): ResultSet = {
- // Removed by FLINK-27790
- val addJar = DynMethods.builder("addJar")
- .impl(executor.getClass, classOf[String], classOf[String])
- .build(executor)
- addJar.invoke[Void](sessionId, addJarOperation.getPath)
- successResultSet
- }
-
- /**
- * Runs a RemoveJarOperation with the executor. Only jars added by AddJarOperation could
- * be removed.
- *
- * @param removeJarOperation Remove-jar operation.
- * @param executor A gateway for communicating with Flink and other external systems.
- * @param sessionId Id of the session.
- * @return A ResultSet of RemoveJarOperation execution.
- */
- def runRemoveJarOperation(
- removeJarOperation: RemoveJarOperation,
- executor: Executor,
- sessionId: String): ResultSet = {
- executor.removeJar(sessionId, removeJarOperation.getPath)
- successResultSet
- }
-
- /**
- * Runs a ShowJarsOperation with the executor. Returns the jars of the current session.
- *
- * @param showJarsOperation Show-jar operation.
- * @param executor A gateway for communicating with Flink and other external systems.
- * @param sessionId Id of the session.
- * @return A ResultSet of ShowJarsOperation execution.
- */
- def runShowJarOperation(
- showJarsOperation: ShowJarsOperation,
- executor: Executor,
- sessionId: String): ResultSet = {
- // Removed by FLINK-27790
- val listJars = DynMethods.builder("listJars")
- .impl(executor.getClass, classOf[String])
- .build(executor)
- val jars = listJars.invoke[util.List[String]](sessionId)
- ResultSetUtil.stringListToResultSet(jars.asScala.toList, "jar")
- }
-}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/PlanOnlyStatement.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/PlanOnlyStatement.scala
index afe04a30736..1284bfd73e6 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/PlanOnlyStatement.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/PlanOnlyStatement.scala
@@ -17,10 +17,13 @@
package org.apache.kyuubi.engine.flink.operation
+import scala.concurrent.duration.Duration
+
+import com.google.common.base.Preconditions
import org.apache.flink.table.api.TableEnvironment
+import org.apache.flink.table.gateway.api.operation.OperationHandle
import org.apache.flink.table.operations.command._
-import org.apache.kyuubi.engine.flink.FlinkEngineUtils.isFlinkVersionAtMost
import org.apache.kyuubi.engine.flink.result.ResultSetUtil
import org.apache.kyuubi.operation.{ExecutionMode, ParseMode, PhysicalMode, PlanOnlyMode, UnknownMode}
import org.apache.kyuubi.operation.PlanOnlyMode.{notSupportedModeError, unknownModeError}
@@ -33,7 +36,10 @@ import org.apache.kyuubi.session.Session
class PlanOnlyStatement(
session: Session,
override val statement: String,
- mode: PlanOnlyMode) extends FlinkOperation(session) {
+ mode: PlanOnlyMode,
+ queryTimeout: Long,
+ resultMaxRows: Int,
+ resultFetchTimeout: Duration) extends FlinkOperation(session) {
private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
private val lineSeparator: String = System.lineSeparator()
@@ -45,19 +51,22 @@ class PlanOnlyStatement(
}
override protected def runInternal(): Unit = {
+ addTimeoutMonitor(queryTimeout)
try {
- val operation = executor.parseStatement(sessionId, statement)
+ val operations = executor.getTableEnvironment.getParser.parse(statement)
+ Preconditions.checkArgument(
+ operations.size() == 1,
+ "Plan-only mode supports single statement only",
+ null)
+ val operation = operations.get(0)
operation match {
- case setOperation: SetOperation =>
- resultSet = OperationUtils.runSetOperation(setOperation, executor, sessionId)
- case resetOperation: ResetOperation =>
- resultSet = OperationUtils.runResetOperation(resetOperation, executor, sessionId)
- case addJarOperation: AddJarOperation if isFlinkVersionAtMost("1.15") =>
- resultSet = OperationUtils.runAddJarOperation(addJarOperation, executor, sessionId)
- case removeJarOperation: RemoveJarOperation =>
- resultSet = OperationUtils.runRemoveJarOperation(removeJarOperation, executor, sessionId)
- case showJarsOperation: ShowJarsOperation if isFlinkVersionAtMost("1.15") =>
- resultSet = OperationUtils.runShowJarOperation(showJarsOperation, executor, sessionId)
+ case _: SetOperation | _: ResetOperation | _: AddJarOperation | _: RemoveJarOperation |
+ _: ShowJarsOperation =>
+ val resultFetcher = executor.executeStatement(
+ new OperationHandle(getHandle.identifier),
+ statement)
+ resultSet =
+ ResultSetUtil.fromResultFetcher(resultFetcher, resultMaxRows, resultFetchTimeout);
case _ => explainOperation(statement)
}
} catch {
@@ -66,7 +75,7 @@ class PlanOnlyStatement(
}
private def explainOperation(statement: String): Unit = {
- val tableEnv: TableEnvironment = sessionContext.getExecutionContext.getTableEnvironment
+ val tableEnv: TableEnvironment = executor.getTableEnvironment
val explainPlans =
tableEnv.explainSql(statement).split(s"$lineSeparator$lineSeparator")
val operationPlan = mode match {
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/SetCurrentCatalog.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/SetCurrentCatalog.scala
index 489cc638458..f279ccda616 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/SetCurrentCatalog.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/SetCurrentCatalog.scala
@@ -17,15 +17,21 @@
package org.apache.kyuubi.engine.flink.operation
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
class SetCurrentCatalog(session: Session, catalog: String)
extends FlinkOperation(session) {
+ private val operationLog: OperationLog =
+ OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def runInternal(): Unit = {
try {
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
- tableEnv.useCatalog(catalog)
+ val catalogManager = sessionContext.getSessionState.catalogManager
+ catalogManager.setCurrentCatalog(catalog)
setHasResultSet(false)
} catch onError()
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/SetCurrentDatabase.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/SetCurrentDatabase.scala
index 0d3598405d8..70535e8344f 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/SetCurrentDatabase.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/SetCurrentDatabase.scala
@@ -17,15 +17,21 @@
package org.apache.kyuubi.engine.flink.operation
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
class SetCurrentDatabase(session: Session, database: String)
extends FlinkOperation(session) {
+ private val operationLog: OperationLog =
+ OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def runInternal(): Unit = {
try {
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
- tableEnv.useDatabase(database)
+ val catalogManager = sessionContext.getSessionState.catalogManager
+ catalogManager.setCurrentDatabase(database)
setHasResultSet(false)
} catch onError()
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/QueryResultFetchIterator.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/QueryResultFetchIterator.scala
new file mode 100644
index 00000000000..60ae08d9dd8
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/QueryResultFetchIterator.scala
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink.result
+
+import java.util
+import java.util.concurrent.Executors
+
+import scala.collection.convert.ImplicitConversions._
+import scala.concurrent.{Await, ExecutionContext, ExecutionContextExecutor, Future}
+import scala.concurrent.duration.Duration
+
+import com.google.common.util.concurrent.ThreadFactoryBuilder
+import org.apache.flink.table.api.DataTypes
+import org.apache.flink.table.catalog.ResolvedSchema
+import org.apache.flink.table.data.RowData
+import org.apache.flink.table.data.conversion.DataStructureConverters
+import org.apache.flink.table.gateway.service.result.ResultFetcher
+import org.apache.flink.table.types.DataType
+import org.apache.flink.types.Row
+
+import org.apache.kyuubi.Logging
+import org.apache.kyuubi.engine.flink.shim.FlinkResultSet
+import org.apache.kyuubi.operation.FetchIterator
+
+class QueryResultFetchIterator(
+ resultFetcher: ResultFetcher,
+ maxRows: Int = 1000000,
+ resultFetchTimeout: Duration = Duration.Inf) extends FetchIterator[Row] with Logging {
+
+ val schema: ResolvedSchema = resultFetcher.getResultSchema
+
+ val dataTypes: util.List[DataType] = schema.getColumnDataTypes
+
+ var token: Long = 0
+
+ var pos: Long = 0
+
+ var fetchStart: Long = 0
+
+ var bufferedRows: Array[Row] = new Array[Row](0)
+
+ var hasNext: Boolean = true
+
+ val FETCH_INTERVAL_MS: Long = 1000
+
+ private val executor = Executors.newSingleThreadScheduledExecutor(
+ new ThreadFactoryBuilder().setNameFormat("flink-query-iterator-%d").setDaemon(true).build)
+
+ implicit private val executionContext: ExecutionContextExecutor =
+ ExecutionContext.fromExecutor(executor)
+
+ /**
+ * Begin a fetch block, forward from the current position.
+ *
+ * Throws TimeoutException if no data is fetched within the timeout.
+ */
+ override def fetchNext(): Unit = {
+ if (!hasNext) {
+ return
+ }
+ val future = Future(() -> {
+ var fetched = false
+ // if no timeout is set, this would block until some rows are fetched
+ debug(s"Fetching from result store with timeout $resultFetchTimeout ms")
+ while (!fetched && !Thread.interrupted()) {
+ val rs = resultFetcher.fetchResults(token, maxRows - bufferedRows.length)
+ val flinkRs = new FlinkResultSet(rs)
+ // TODO: replace string-based match when Flink 1.16 support is dropped
+ flinkRs.getResultType.name() match {
+ case "EOS" =>
+ debug("EOS received, no more data to fetch.")
+ fetched = true
+ hasNext = false
+ case "NOT_READY" =>
+ // if flink jobs are not ready, continue to retry
+ debug("Result not ready, retrying...")
+ case "PAYLOAD" =>
+ val fetchedData = flinkRs.getData
+ // if no data fetched, continue to retry
+ if (!fetchedData.isEmpty) {
+ debug(s"Fetched ${fetchedData.length} rows from result store.")
+ fetched = true
+ bufferedRows ++= fetchedData.map(rd => convertToRow(rd, dataTypes.toList))
+ fetchStart = pos
+ } else {
+ debug("No data fetched, retrying...")
+ }
+ case _ =>
+ throw new RuntimeException(s"Unexpected result type: ${flinkRs.getResultType}")
+ }
+ if (hasNext) {
+ val nextToken = flinkRs.getNextToken
+ if (nextToken == null) {
+ hasNext = false
+ } else {
+ token = nextToken
+ }
+ }
+ Thread.sleep(FETCH_INTERVAL_MS)
+ }
+ })
+ Await.result(future, resultFetchTimeout)
+ }
+
+ /**
+ * Begin a fetch block, moving the iterator to the given position.
+ * Resets the fetch start offset.
+ *
+ * @param pos index to move a position of iterator.
+ */
+ override def fetchAbsolute(pos: Long): Unit = {
+ val effectivePos = Math.max(pos, 0)
+ if (effectivePos < bufferedRows.length) {
+ this.fetchStart = effectivePos
+ return
+ }
+ throw new IllegalArgumentException(s"Cannot skip to an unreachable position $effectivePos.")
+ }
+
+ override def getFetchStart: Long = fetchStart
+
+ override def getPosition: Long = pos
+
+ /**
+ * @return returns row if any and null if no more rows can be fetched.
+ */
+ override def next(): Row = {
+ if (pos < bufferedRows.length) {
+ debug(s"Fetching from buffered rows at pos $pos.")
+ val row = bufferedRows(pos.toInt)
+ pos += 1
+ if (pos >= maxRows) {
+ hasNext = false
+ }
+ row
+ } else {
+ // block until some rows are fetched or TimeoutException is thrown
+ fetchNext()
+ if (hasNext) {
+ val row = bufferedRows(pos.toInt)
+ pos += 1
+ if (pos >= maxRows) {
+ hasNext = false
+ }
+ row
+ } else {
+ null
+ }
+ }
+ }
+
+ def close(): Unit = {
+ resultFetcher.close()
+ executor.shutdown()
+ }
+
+ private[this] def convertToRow(r: RowData, dataTypes: List[DataType]): Row = {
+ val converter = DataStructureConverters.getConverter(DataTypes.ROW(dataTypes: _*))
+ converter.toExternal(r).asInstanceOf[Row]
+ }
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/ResultSet.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/ResultSet.scala
index 13673381258..b8d407297ac 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/ResultSet.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/ResultSet.scala
@@ -22,7 +22,8 @@ import java.util
import scala.collection.JavaConverters._
import com.google.common.collect.Iterators
-import org.apache.flink.table.api.{ResultKind, TableResult}
+import org.apache.flink.api.common.JobID
+import org.apache.flink.table.api.{DataTypes, ResultKind}
import org.apache.flink.table.catalog.Column
import org.apache.flink.types.Row
@@ -49,6 +50,13 @@ case class ResultSet(
def getColumns: util.List[Column] = columns
def getData: FetchIterator[Row] = data
+
+ def close: Unit = {
+ data match {
+ case queryIte: QueryResultFetchIterator => queryIte.close()
+ case _ =>
+ }
+ }
}
/**
@@ -57,14 +65,17 @@ case class ResultSet(
*/
object ResultSet {
- def fromTableResult(tableResult: TableResult): ResultSet = {
- val schema = tableResult.getResolvedSchema
- // collect all rows from table result as list
- // this is ok as TableResult contains limited rows
- val rows = tableResult.collect.asScala.toArray
- builder.resultKind(tableResult.getResultKind)
- .columns(schema.getColumns)
- .data(rows)
+ def fromJobId(jobID: JobID): ResultSet = {
+ val data: Array[Row] = if (jobID != null) {
+ Array(Row.of(jobID.toString))
+ } else {
+ // should not happen
+ Array(Row.of("(Empty Job ID)"))
+ }
+ builder
+ .resultKind(ResultKind.SUCCESS_WITH_CONTENT)
+ .columns(Column.physical("result", DataTypes.STRING()))
+ .data(data)
.build
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/ResultSetUtil.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/ResultSetUtil.scala
index ded271cf1d7..8b722f1e5e9 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/ResultSetUtil.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/result/ResultSetUtil.scala
@@ -15,11 +15,14 @@
* limitations under the License.
*/
-package org.apache.kyuubi.engine.flink.result;
+package org.apache.kyuubi.engine.flink.result
+
+import scala.concurrent.duration.Duration
import org.apache.flink.table.api.DataTypes
import org.apache.flink.table.api.ResultKind
import org.apache.flink.table.catalog.Column
+import org.apache.flink.table.gateway.service.result.ResultFetcher
import org.apache.flink.types.Row
/** Utility object for building ResultSet. */
@@ -54,4 +57,20 @@ object ResultSetUtil {
.columns(Column.physical("result", DataTypes.STRING))
.data(Array[Row](Row.of("OK")))
.build
+
+ def fromResultFetcher(
+ resultFetcher: ResultFetcher,
+ maxRows: Int,
+ resultFetchTimeout: Duration): ResultSet = {
+ if (maxRows <= 0) {
+ throw new IllegalArgumentException("maxRows should be positive")
+ }
+ val schema = resultFetcher.getResultSchema
+ val ite = new QueryResultFetchIterator(resultFetcher, maxRows, resultFetchTimeout)
+ ResultSet.builder
+ .resultKind(ResultKind.SUCCESS_WITH_CONTENT)
+ .columns(schema.getColumns)
+ .data(ite)
+ .build
+ }
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/schema/RowSet.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/schema/RowSet.scala
index ad83f9c2ba2..c446396d5bb 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/schema/RowSet.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/schema/RowSet.scala
@@ -21,7 +21,9 @@ import java.{lang, util}
import java.nio.ByteBuffer
import java.nio.charset.StandardCharsets
import java.sql.{Date, Timestamp}
-import java.time.{LocalDate, LocalDateTime}
+import java.time.{Instant, LocalDate, LocalDateTime, ZonedDateTime, ZoneId}
+import java.time.format.{DateTimeFormatter, DateTimeFormatterBuilder, TextStyle}
+import java.time.temporal.ChronoField
import java.util.Collections
import scala.collection.JavaConverters._
@@ -42,15 +44,16 @@ object RowSet {
def resultSetToTRowSet(
rows: Seq[Row],
resultSet: ResultSet,
+ zoneId: ZoneId,
protocolVersion: TProtocolVersion): TRowSet = {
if (protocolVersion.getValue < TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V6.getValue) {
- toRowBaseSet(rows, resultSet)
+ toRowBaseSet(rows, resultSet, zoneId)
} else {
- toColumnBasedSet(rows, resultSet)
+ toColumnBasedSet(rows, resultSet, zoneId)
}
}
- def toRowBaseSet(rows: Seq[Row], resultSet: ResultSet): TRowSet = {
+ def toRowBaseSet(rows: Seq[Row], resultSet: ResultSet, zoneId: ZoneId): TRowSet = {
val rowSize = rows.size
val tRows = new util.ArrayList[TRow](rowSize)
var i = 0
@@ -60,7 +63,7 @@ object RowSet {
val columnSize = row.getArity
var j = 0
while (j < columnSize) {
- val columnValue = toTColumnValue(j, row, resultSet)
+ val columnValue = toTColumnValue(j, row, resultSet, zoneId)
tRow.addToColVals(columnValue)
j += 1
}
@@ -71,14 +74,14 @@ object RowSet {
new TRowSet(0, tRows)
}
- def toColumnBasedSet(rows: Seq[Row], resultSet: ResultSet): TRowSet = {
+ def toColumnBasedSet(rows: Seq[Row], resultSet: ResultSet, zoneId: ZoneId): TRowSet = {
val size = rows.length
val tRowSet = new TRowSet(0, new util.ArrayList[TRow](size))
val columnSize = resultSet.getColumns.size()
var i = 0
while (i < columnSize) {
val field = resultSet.getColumns.get(i)
- val tColumn = toTColumn(rows, i, field.getDataType.getLogicalType)
+ val tColumn = toTColumn(rows, i, field.getDataType.getLogicalType, zoneId)
tRowSet.addToColumns(tColumn)
i += 1
}
@@ -88,7 +91,8 @@ object RowSet {
private def toTColumnValue(
ordinal: Int,
row: Row,
- resultSet: ResultSet): TColumnValue = {
+ resultSet: ResultSet,
+ zoneId: ZoneId): TColumnValue = {
val column = resultSet.getColumns.get(ordinal)
val logicalType = column.getDataType.getLogicalType
@@ -153,6 +157,12 @@ object RowSet {
s"for type ${t.getClass}.")
}
TColumnValue.stringVal(tStringValue)
+ case _: LocalZonedTimestampType =>
+ val tStringValue = new TStringValue
+ val fieldValue = row.getField(ordinal)
+ tStringValue.setValue(TIMESTAMP_LZT_FORMATTER.format(
+ ZonedDateTime.ofInstant(fieldValue.asInstanceOf[Instant], zoneId)))
+ TColumnValue.stringVal(tStringValue)
case t =>
val tStringValue = new TStringValue
if (row.getField(ordinal) != null) {
@@ -166,7 +176,11 @@ object RowSet {
ByteBuffer.wrap(bitSet.toByteArray)
}
- private def toTColumn(rows: Seq[Row], ordinal: Int, logicalType: LogicalType): TColumn = {
+ private def toTColumn(
+ rows: Seq[Row],
+ ordinal: Int,
+ logicalType: LogicalType,
+ zoneId: ZoneId): TColumn = {
val nulls = new java.util.BitSet()
// for each column, determine the conversion class by sampling the first non-value value
// if there's no row, set the entire column empty
@@ -211,6 +225,12 @@ object RowSet {
s"for type ${t.getClass}.")
}
TColumn.stringVal(new TStringColumn(values, nulls))
+ case _: LocalZonedTimestampType =>
+ val values = getOrSetAsNull[Instant](rows, ordinal, nulls, Instant.EPOCH)
+ .toArray().map(v =>
+ TIMESTAMP_LZT_FORMATTER.format(
+ ZonedDateTime.ofInstant(v.asInstanceOf[Instant], zoneId)))
+ TColumn.stringVal(new TStringColumn(values.toList.asJava, nulls))
case _ =>
var i = 0
val rowSize = rows.length
@@ -303,12 +323,14 @@ object RowSet {
case _: DecimalType => TTypeId.DECIMAL_TYPE
case _: DateType => TTypeId.DATE_TYPE
case _: TimestampType => TTypeId.TIMESTAMP_TYPE
+ case _: LocalZonedTimestampType => TTypeId.TIMESTAMPLOCALTZ_TYPE
case _: ArrayType => TTypeId.ARRAY_TYPE
case _: MapType => TTypeId.MAP_TYPE
case _: RowType => TTypeId.STRUCT_TYPE
case _: BinaryType => TTypeId.BINARY_TYPE
+ case _: VarBinaryType => TTypeId.BINARY_TYPE
case _: TimeType => TTypeId.STRING_TYPE
- case t @ (_: ZonedTimestampType | _: LocalZonedTimestampType | _: MultisetType |
+ case t @ (_: ZonedTimestampType | _: MultisetType |
_: YearMonthIntervalType | _: DayTimeIntervalType) =>
throw new IllegalArgumentException(
"Flink data type `%s` is not supported currently".format(t.asSummaryString()),
@@ -369,11 +391,33 @@ object RowSet {
// Only match string in nested type values
"\"" + s + "\""
- case (bin: Array[Byte], _: BinaryType) =>
+ case (bin: Array[Byte], _ @(_: BinaryType | _: VarBinaryType)) =>
new String(bin, StandardCharsets.UTF_8)
case (other, _) =>
other.toString
}
}
+
+ /** should stay in sync with org.apache.kyuubi.jdbc.hive.common.TimestampTZUtil */
+ var TIMESTAMP_LZT_FORMATTER: DateTimeFormatter = {
+ val builder = new DateTimeFormatterBuilder
+ // Date part
+ builder.append(DateTimeFormatter.ofPattern("yyyy-MM-dd"))
+ // Time part
+ builder
+ .optionalStart
+ .appendLiteral(" ")
+ .append(DateTimeFormatter.ofPattern("HH:mm:ss"))
+ .optionalStart
+ .appendFraction(ChronoField.NANO_OF_SECOND, 1, 9, true)
+ .optionalEnd
+ .optionalEnd
+
+ // Zone part
+ builder.optionalStart.appendLiteral(" ").optionalEnd
+ builder.optionalStart.appendZoneText(TextStyle.NARROW).optionalEnd
+
+ builder.toFormatter
+ }
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/session/FlinkSQLSessionManager.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/session/FlinkSQLSessionManager.scala
index 07971e39fae..b7cd462172f 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/session/FlinkSQLSessionManager.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/session/FlinkSQLSessionManager.scala
@@ -17,12 +17,17 @@
package org.apache.kyuubi.engine.flink.session
-import org.apache.flink.table.client.gateway.context.DefaultContext
-import org.apache.flink.table.client.gateway.local.LocalExecutor
+import scala.collection.JavaConverters._
+import scala.collection.JavaConverters.mapAsJavaMap
+
+import org.apache.flink.table.gateway.api.session.SessionEnvironment
+import org.apache.flink.table.gateway.rest.util.SqlGatewayRestAPIVersion
+import org.apache.flink.table.gateway.service.context.DefaultContext
import org.apache.hive.service.rpc.thrift.TProtocolVersion
import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_HANDLE_KEY
import org.apache.kyuubi.engine.flink.operation.FlinkSQLOperationManager
+import org.apache.kyuubi.engine.flink.shim.FlinkSessionManager
import org.apache.kyuubi.session.{Session, SessionHandle, SessionManager}
class FlinkSQLSessionManager(engineContext: DefaultContext)
@@ -31,11 +36,11 @@ class FlinkSQLSessionManager(engineContext: DefaultContext)
override protected def isServer: Boolean = false
val operationManager = new FlinkSQLOperationManager()
- val executor = new LocalExecutor(engineContext)
+ val sessionManager = new FlinkSessionManager(engineContext)
override def start(): Unit = {
super.start()
- executor.start()
+ sessionManager.start()
}
override protected def createSession(
@@ -46,19 +51,40 @@ class FlinkSQLSessionManager(engineContext: DefaultContext)
conf: Map[String, String]): Session = {
conf.get(KYUUBI_SESSION_HANDLE_KEY).map(SessionHandle.fromUUID).flatMap(
getSessionOption).getOrElse {
- new FlinkSessionImpl(
+ val flinkInternalSession = sessionManager.openSession(
+ SessionEnvironment.newBuilder
+ .setSessionEndpointVersion(SqlGatewayRestAPIVersion.V1)
+ .addSessionConfig(mapAsJavaMap(conf))
+ .build)
+ val sessionConfig = flinkInternalSession.getSessionConfig
+ sessionConfig.putAll(conf.asJava)
+ val session = new FlinkSessionImpl(
protocol,
user,
password,
ipAddress,
conf,
this,
- executor)
+ flinkInternalSession)
+ session
}
}
+ override def getSessionOption(sessionHandle: SessionHandle): Option[Session] = {
+ val session = super.getSessionOption(sessionHandle)
+ session.foreach(s => s.asInstanceOf[FlinkSessionImpl].fSession.touch())
+ session
+ }
+
override def closeSession(sessionHandle: SessionHandle): Unit = {
+ val fSession = super.getSessionOption(sessionHandle)
+ fSession.foreach(s =>
+ sessionManager.closeSession(s.asInstanceOf[FlinkSessionImpl].fSession.getSessionHandle))
super.closeSession(sessionHandle)
- executor.closeSession(sessionHandle.toString)
+ }
+
+ override def stop(): Unit = synchronized {
+ sessionManager.stop()
+ super.stop()
}
}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/session/FlinkSessionImpl.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/session/FlinkSessionImpl.scala
index 75087b48ca2..b8d1f85692b 100644
--- a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/session/FlinkSessionImpl.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/session/FlinkSessionImpl.scala
@@ -19,16 +19,19 @@ package org.apache.kyuubi.engine.flink.session
import scala.util.control.NonFatal
+import org.apache.flink.configuration.Configuration
import org.apache.flink.runtime.util.EnvironmentInformation
import org.apache.flink.table.client.gateway.SqlExecutionException
-import org.apache.flink.table.client.gateway.context.SessionContext
-import org.apache.flink.table.client.gateway.local.LocalExecutor
+import org.apache.flink.table.gateway.api.operation.OperationHandle
+import org.apache.flink.table.gateway.service.context.SessionContext
+import org.apache.flink.table.gateway.service.session.{Session => FSession}
import org.apache.hive.service.rpc.thrift.{TGetInfoType, TGetInfoValue, TProtocolVersion}
import org.apache.kyuubi.KyuubiSQLException
import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_HANDLE_KEY
import org.apache.kyuubi.engine.flink.FlinkEngineUtils
-import org.apache.kyuubi.session.{AbstractSession, SessionHandle, SessionManager}
+import org.apache.kyuubi.engine.flink.udf.KDFRegistry
+import org.apache.kyuubi.session.{AbstractSession, SessionHandle, SessionManager, USE_CATALOG, USE_DATABASE}
class FlinkSessionImpl(
protocol: TProtocolVersion,
@@ -37,16 +40,19 @@ class FlinkSessionImpl(
ipAddress: String,
conf: Map[String, String],
sessionManager: SessionManager,
- val executor: LocalExecutor)
+ val fSession: FSession)
extends AbstractSession(protocol, user, password, ipAddress, conf, sessionManager) {
override val handle: SessionHandle =
- conf.get(KYUUBI_SESSION_HANDLE_KEY).map(SessionHandle.fromUUID).getOrElse(SessionHandle())
+ conf.get(KYUUBI_SESSION_HANDLE_KEY).map(SessionHandle.fromUUID)
+ .getOrElse(SessionHandle.fromUUID(fSession.getSessionHandle.getIdentifier.toString))
- lazy val sessionContext: SessionContext = {
- FlinkEngineUtils.getSessionContext(executor, handle.identifier.toString)
+ val sessionContext: SessionContext = {
+ FlinkEngineUtils.getSessionContext(fSession)
}
+ KDFRegistry.registerAll(sessionContext)
+
private def setModifiableConfig(key: String, value: String): Unit = {
try {
sessionContext.set(key, value)
@@ -56,26 +62,33 @@ class FlinkSessionImpl(
}
override def open(): Unit = {
- executor.openSession(handle.identifier.toString)
- normalizedConf.foreach {
- case ("use:catalog", catalog) =>
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
- try {
- tableEnv.useCatalog(catalog)
- } catch {
- case NonFatal(e) =>
+ val executor = fSession.createExecutor(Configuration.fromMap(fSession.getSessionConfig))
+
+ val (useCatalogAndDatabaseConf, otherConf) = normalizedConf.partition { case (k, _) =>
+ Array(USE_CATALOG, USE_DATABASE).contains(k)
+ }
+
+ useCatalogAndDatabaseConf.get(USE_CATALOG).foreach { catalog =>
+ try {
+ executor.executeStatement(OperationHandle.create, s"USE CATALOG $catalog")
+ } catch {
+ case NonFatal(e) =>
+ throw e
+ }
+ }
+
+ useCatalogAndDatabaseConf.get("use:database").foreach { database =>
+ try {
+ executor.executeStatement(OperationHandle.create, s"USE $database")
+ } catch {
+ case NonFatal(e) =>
+ if (database != "default") {
throw e
- }
- case ("use:database", database) =>
- val tableEnv = sessionContext.getExecutionContext.getTableEnvironment
- try {
- tableEnv.useDatabase(database)
- } catch {
- case NonFatal(e) =>
- if (database != "default") {
- throw e
- }
- }
+ }
+ }
+ }
+
+ otherConf.foreach {
case (key, value) => setModifiableConfig(key, value)
}
super.open()
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/shim/FlinkResultSet.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/shim/FlinkResultSet.scala
new file mode 100644
index 00000000000..7fb05c8446b
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/shim/FlinkResultSet.scala
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink.shim
+
+import java.lang.{Long => JLong}
+import java.util
+
+import org.apache.flink.table.data.RowData
+import org.apache.flink.table.gateway.api.results.ResultSet.ResultType
+
+import org.apache.kyuubi.util.reflect.ReflectUtils._
+
+class FlinkResultSet(resultSet: AnyRef) {
+
+ def getData: util.List[RowData] = invokeAs(resultSet, "getData")
+
+ def getNextToken: JLong = invokeAs(resultSet, "getNextToken")
+
+ def getResultType: ResultType = invokeAs(resultSet, "getResultType")
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/shim/FlinkSessionManager.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/shim/FlinkSessionManager.scala
new file mode 100644
index 00000000000..89414ac4c54
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/shim/FlinkSessionManager.scala
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink.shim
+
+import org.apache.flink.table.gateway.api.session.{SessionEnvironment, SessionHandle}
+import org.apache.flink.table.gateway.service.context.DefaultContext
+import org.apache.flink.table.gateway.service.session.Session
+
+import org.apache.kyuubi.engine.flink.FlinkEngineUtils.FLINK_RUNTIME_VERSION
+import org.apache.kyuubi.util.reflect._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
+
+class FlinkSessionManager(engineContext: DefaultContext) {
+
+ val sessionManager: AnyRef = {
+ if (FLINK_RUNTIME_VERSION === "1.16") {
+ DynConstructors.builder().impl(
+ "org.apache.flink.table.gateway.service.session.SessionManager",
+ classOf[DefaultContext])
+ .build()
+ .newInstance(engineContext)
+ } else {
+ DynConstructors.builder().impl(
+ "org.apache.flink.table.gateway.service.session.SessionManagerImpl",
+ classOf[DefaultContext])
+ .build()
+ .newInstance(engineContext)
+ }
+ }
+
+ def start(): Unit = invokeAs(sessionManager, "start")
+
+ def stop(): Unit = invokeAs(sessionManager, "stop")
+
+ def getSession(sessionHandle: SessionHandle): Session =
+ invokeAs(sessionManager, "getSession", (classOf[SessionHandle], sessionHandle))
+
+ def openSession(environment: SessionEnvironment): Session =
+ invokeAs(sessionManager, "openSession", (classOf[SessionEnvironment], environment))
+
+ def closeSession(sessionHandle: SessionHandle): Unit =
+ invokeAs(sessionManager, "closeSession", (classOf[SessionHandle], sessionHandle))
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/udf/KDFRegistry.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/udf/KDFRegistry.scala
new file mode 100644
index 00000000000..9ccbe7940d0
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/udf/KDFRegistry.scala
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink.udf
+
+import java.util
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.flink.configuration.Configuration
+import org.apache.flink.table.functions.{ScalarFunction, UserDefinedFunction}
+import org.apache.flink.table.gateway.service.context.SessionContext
+
+import org.apache.kyuubi.{KYUUBI_VERSION, Utils}
+import org.apache.kyuubi.config.KyuubiReservedKeys.{KYUUBI_ENGINE_NAME, KYUUBI_SESSION_USER_KEY}
+import org.apache.kyuubi.engine.flink.FlinkEngineUtils.FLINK_RUNTIME_VERSION
+import org.apache.kyuubi.util.reflect.DynMethods
+
+object KDFRegistry {
+
+ def createKyuubiDefinedFunctions(sessionContext: SessionContext): Array[KyuubiDefinedFunction] = {
+
+ val kyuubiDefinedFunctions = new ArrayBuffer[KyuubiDefinedFunction]
+
+ val flinkConfigMap: util.Map[String, String] = {
+ if (FLINK_RUNTIME_VERSION === "1.16") {
+ DynMethods
+ .builder("getConfigMap")
+ .impl(classOf[SessionContext])
+ .build()
+ .invoke(sessionContext)
+ .asInstanceOf[util.Map[String, String]]
+ } else {
+ DynMethods
+ .builder("getSessionConf")
+ .impl(classOf[SessionContext])
+ .build()
+ .invoke(sessionContext)
+ .asInstanceOf[Configuration]
+ .toMap
+ }
+ }
+
+ val kyuubi_version: KyuubiDefinedFunction = create(
+ "kyuubi_version",
+ new KyuubiVersionFunction(flinkConfigMap),
+ "Return the version of Kyuubi Server",
+ "string",
+ "1.8.0")
+ kyuubiDefinedFunctions += kyuubi_version
+
+ val engineName: KyuubiDefinedFunction = create(
+ "kyuubi_engine_name",
+ new EngineNameFunction(flinkConfigMap),
+ "Return the application name for the associated query engine",
+ "string",
+ "1.8.0")
+ kyuubiDefinedFunctions += engineName
+
+ val engineId: KyuubiDefinedFunction = create(
+ "kyuubi_engine_id",
+ new EngineIdFunction(flinkConfigMap),
+ "Return the application id for the associated query engine",
+ "string",
+ "1.8.0")
+ kyuubiDefinedFunctions += engineId
+
+ val systemUser: KyuubiDefinedFunction = create(
+ "kyuubi_system_user",
+ new SystemUserFunction(flinkConfigMap),
+ "Return the system user name for the associated query engine",
+ "string",
+ "1.8.0")
+ kyuubiDefinedFunctions += systemUser
+
+ val sessionUser: KyuubiDefinedFunction = create(
+ "kyuubi_session_user",
+ new SessionUserFunction(flinkConfigMap),
+ "Return the session username for the associated query engine",
+ "string",
+ "1.8.0")
+ kyuubiDefinedFunctions += sessionUser
+
+ kyuubiDefinedFunctions.toArray
+ }
+
+ def create(
+ name: String,
+ udf: UserDefinedFunction,
+ description: String,
+ returnType: String,
+ since: String): KyuubiDefinedFunction = {
+ val kdf = KyuubiDefinedFunction(name, udf, description, returnType, since)
+ kdf
+ }
+
+ def registerAll(sessionContext: SessionContext): Unit = {
+ val functions = createKyuubiDefinedFunctions(sessionContext)
+ for (func <- functions) {
+ sessionContext.getSessionState.functionCatalog
+ .registerTemporarySystemFunction(func.name, func.udf, true)
+ }
+ }
+}
+
+class KyuubiVersionFunction(confMap: util.Map[String, String]) extends ScalarFunction {
+ def eval(): String = KYUUBI_VERSION
+}
+
+class EngineNameFunction(confMap: util.Map[String, String]) extends ScalarFunction {
+ def eval(): String = {
+ confMap match {
+ case m if m.containsKey("yarn.application.name") => m.get("yarn.application.name")
+ case m if m.containsKey("kubernetes.cluster-id") => m.get("kubernetes.cluster-id")
+ case m => m.getOrDefault(KYUUBI_ENGINE_NAME, "unknown-engine-name")
+ }
+ }
+}
+
+class EngineIdFunction(confMap: util.Map[String, String]) extends ScalarFunction {
+ def eval(): String = {
+ confMap match {
+ case m if m.containsKey("yarn.application.id") => m.get("yarn.application.id")
+ case m if m.containsKey("kubernetes.cluster-id") => m.get("kubernetes.cluster-id")
+ case m => m.getOrDefault("high-availability.cluster-id", "unknown-engine-id")
+ }
+ }
+}
+
+class SystemUserFunction(confMap: util.Map[String, String]) extends ScalarFunction {
+ def eval(): String = Utils.currentUser
+}
+
+class SessionUserFunction(confMap: util.Map[String, String]) extends ScalarFunction {
+ def eval(): String = confMap.getOrDefault(KYUUBI_SESSION_USER_KEY, "unknown-user")
+}
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkHudiOperationSuite.scala b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/udf/KyuubiDefinedFunction.scala
similarity index 65%
rename from externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkHudiOperationSuite.scala
rename to externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/udf/KyuubiDefinedFunction.scala
index c5e8be37aa4..5cfce86d6e0 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkHudiOperationSuite.scala
+++ b/externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/udf/KyuubiDefinedFunction.scala
@@ -15,16 +15,20 @@
* limitations under the License.
*/
-package org.apache.kyuubi.engine.spark.operation
+package org.apache.kyuubi.engine.flink.udf
-import org.apache.kyuubi.engine.spark.WithSparkSQLEngine
-import org.apache.kyuubi.operation.HudiMetadataTests
-import org.apache.kyuubi.tags.HudiTest
+import org.apache.flink.table.functions.UserDefinedFunction
-@HudiTest
-class SparkHudiOperationSuite extends WithSparkSQLEngine with HudiMetadataTests {
-
- override protected def jdbcUrl: String = getJdbcUrl
-
- override def withKyuubiConf: Map[String, String] = extraConfigs
-}
+/**
+ * A wrapper for Flink's [[UserDefinedFunction]]
+ *
+ * @param name function name
+ * @param udf user-defined function
+ * @param description function description
+ */
+case class KyuubiDefinedFunction(
+ name: String,
+ udf: UserDefinedFunction,
+ description: String,
+ returnType: String,
+ since: String)
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithDiscoveryFlinkSQLEngine.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithDiscoveryFlinkSQLEngine.scala
new file mode 100644
index 00000000000..c352429eadc
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithDiscoveryFlinkSQLEngine.scala
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink
+
+import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.ha.client.{DiscoveryClient, DiscoveryClientProvider}
+
+trait WithDiscoveryFlinkSQLEngine {
+
+ protected def namespace: String
+
+ protected def conf: KyuubiConf
+
+ def withDiscoveryClient(f: DiscoveryClient => Unit): Unit = {
+ DiscoveryClientProvider.withDiscoveryClient(conf)(f)
+ }
+
+ def getFlinkEngineServiceUrl: String = {
+ var hostPort: Option[(String, Int)] = None
+ var retries = 0
+ while (hostPort.isEmpty && retries < 10) {
+ withDiscoveryClient(client => hostPort = client.getServerHost(namespace))
+ retries += 1
+ Thread.sleep(1000L)
+ }
+ if (hostPort.isEmpty) {
+ throw new RuntimeException("Time out retrieving Flink engine service url.")
+ }
+ // delay the access to thrift service because the thrift service
+ // may not be ready although it's registered
+ Thread.sleep(3000L)
+ s"jdbc:hive2://${hostPort.get._1}:${hostPort.get._2}"
+ }
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkSQLEngine.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkSQLEngine.scala
deleted file mode 100644
index fbfb8df29ac..00000000000
--- a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkSQLEngine.scala
+++ /dev/null
@@ -1,104 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.engine.flink
-
-import scala.collection.JavaConverters._
-
-import org.apache.flink.client.cli.{CustomCommandLine, DefaultCLI}
-import org.apache.flink.configuration.{Configuration, RestOptions}
-import org.apache.flink.runtime.minicluster.{MiniCluster, MiniClusterConfiguration}
-import org.apache.flink.table.client.gateway.context.DefaultContext
-
-import org.apache.kyuubi.{KyuubiFunSuite, Utils}
-import org.apache.kyuubi.config.KyuubiConf
-import org.apache.kyuubi.engine.flink.util.TestUserClassLoaderJar
-
-trait WithFlinkSQLEngine extends KyuubiFunSuite {
-
- protected val flinkConfig = new Configuration()
- protected var miniCluster: MiniCluster = _
- protected var engine: FlinkSQLEngine = _
- // conf will be loaded until start flink engine
- def withKyuubiConf: Map[String, String]
- val kyuubiConf: KyuubiConf = FlinkSQLEngine.kyuubiConf
-
- protected var connectionUrl: String = _
-
- protected val GENERATED_UDF_CLASS: String = "LowerUDF"
-
- protected val GENERATED_UDF_CODE: String =
- s"""
- public class $GENERATED_UDF_CLASS extends org.apache.flink.table.functions.ScalarFunction {
- public String eval(String str) {
- return str.toLowerCase();
- }
- }
- """
-
- override def beforeAll(): Unit = {
- startMiniCluster()
- startFlinkEngine()
- super.beforeAll()
- }
-
- override def afterAll(): Unit = {
- super.afterAll()
- stopFlinkEngine()
- miniCluster.close()
- }
-
- def startFlinkEngine(): Unit = {
- withKyuubiConf.foreach { case (k, v) =>
- System.setProperty(k, v)
- kyuubiConf.set(k, v)
- }
- val udfJar = TestUserClassLoaderJar.createJarFile(
- Utils.createTempDir("test-jar").toFile,
- "test-classloader-udf.jar",
- GENERATED_UDF_CLASS,
- GENERATED_UDF_CODE)
- val engineContext = new DefaultContext(
- List(udfJar.toURI.toURL).asJava,
- flinkConfig,
- List[CustomCommandLine](new DefaultCLI).asJava)
- FlinkSQLEngine.startEngine(engineContext)
- engine = FlinkSQLEngine.currentEngine.get
- connectionUrl = engine.frontendServices.head.connectionUrl
- }
-
- def stopFlinkEngine(): Unit = {
- if (engine != null) {
- engine.stop()
- engine = null
- }
- }
-
- private def startMiniCluster(): Unit = {
- val cfg = new MiniClusterConfiguration.Builder()
- .setConfiguration(flinkConfig)
- .setNumSlotsPerTaskManager(1)
- .build
- miniCluster = new MiniCluster(cfg)
- miniCluster.start()
- flinkConfig.setString(RestOptions.ADDRESS, miniCluster.getRestAddress.get().getHost)
- flinkConfig.setInteger(RestOptions.PORT, miniCluster.getRestAddress.get().getPort)
- }
-
- protected def getJdbcUrl: String = s"jdbc:hive2://$connectionUrl/;"
-
-}
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkSQLEngineLocal.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkSQLEngineLocal.scala
new file mode 100644
index 00000000000..92c1bcd83fc
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkSQLEngineLocal.scala
@@ -0,0 +1,228 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink
+
+import java.io.{File, FilenameFilter}
+import java.lang.ProcessBuilder.Redirect
+import java.net.URI
+import java.nio.file.{Files, Paths}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.flink.configuration.{Configuration, RestOptions}
+import org.apache.flink.runtime.minicluster.{MiniCluster, MiniClusterConfiguration}
+
+import org.apache.kyuubi.{KYUUBI_VERSION, KyuubiException, KyuubiFunSuite, SCALA_COMPILE_VERSION, Utils}
+import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.KyuubiConf._
+import org.apache.kyuubi.ha.HighAvailabilityConf.HA_ADDRESSES
+import org.apache.kyuubi.zookeeper.EmbeddedZookeeper
+import org.apache.kyuubi.zookeeper.ZookeeperConf.{ZK_CLIENT_PORT, ZK_CLIENT_PORT_ADDRESS}
+
+trait WithFlinkSQLEngineLocal extends KyuubiFunSuite with WithFlinkTestResources {
+
+ protected val flinkConfig = new Configuration()
+
+ protected var miniCluster: MiniCluster = _
+
+ protected var engineProcess: Process = _
+
+ private var zkServer: EmbeddedZookeeper = _
+
+ protected val conf: KyuubiConf = FlinkSQLEngine.kyuubiConf
+
+ protected def engineRefId: String
+
+ def withKyuubiConf: Map[String, String]
+
+ protected var connectionUrl: String = _
+
+ override def beforeAll(): Unit = {
+ withKyuubiConf.foreach { case (k, v) =>
+ if (k.startsWith("flink.")) {
+ flinkConfig.setString(k.stripPrefix("flink."), v)
+ }
+ }
+ withKyuubiConf.foreach { case (k, v) =>
+ System.setProperty(k, v)
+ conf.set(k, v)
+ }
+
+ zkServer = new EmbeddedZookeeper()
+ conf.set(ZK_CLIENT_PORT, 0).set(ZK_CLIENT_PORT_ADDRESS, "localhost")
+ zkServer.initialize(conf)
+ zkServer.start()
+ conf.set(HA_ADDRESSES, zkServer.getConnectString)
+
+ val envs = scala.collection.mutable.Map[String, String]()
+ val kyuubiExternals = Utils.getCodeSourceLocation(getClass)
+ .split("externals").head
+ val flinkHome = {
+ val candidates = Paths.get(kyuubiExternals, "externals", "kyuubi-download", "target")
+ .toFile.listFiles(f => f.getName.contains("flink"))
+ if (candidates == null) None else candidates.map(_.toPath).headOption
+ }
+ if (flinkHome.isDefined) {
+ envs("FLINK_HOME") = flinkHome.get.toString
+ envs("FLINK_CONF_DIR") = Paths.get(flinkHome.get.toString, "conf").toString
+ }
+ envs("JAVA_HOME") = System.getProperty("java.home")
+ envs("JAVA_EXEC") = Paths.get(envs("JAVA_HOME"), "bin", "java").toString
+
+ startMiniCluster()
+ startFlinkEngine(envs.toMap)
+ super.beforeAll()
+ }
+
+ override def afterAll(): Unit = {
+ super.afterAll()
+ if (engineProcess != null) {
+ engineProcess.destroy()
+ engineProcess = null
+ }
+ if (miniCluster != null) {
+ miniCluster.close()
+ miniCluster = null
+ }
+ if (zkServer != null) {
+ zkServer.stop()
+ zkServer = null
+ }
+ }
+
+ def startFlinkEngine(envs: Map[String, String]): Unit = {
+ val flinkHome = envs("FLINK_HOME")
+ val processBuilder: ProcessBuilder = new ProcessBuilder
+ processBuilder.environment().putAll(envs.asJava)
+
+ conf.set(ENGINE_FLINK_EXTRA_CLASSPATH, udfJar.getAbsolutePath)
+ val command = new ArrayBuffer[String]()
+
+ command += envs("JAVA_EXEC")
+
+ val memory = conf.get(ENGINE_FLINK_MEMORY)
+ command += s"-Xmx$memory"
+ val javaOptions = conf.get(ENGINE_FLINK_JAVA_OPTIONS)
+ if (javaOptions.isDefined) {
+ command += javaOptions.get
+ }
+
+ command += "-cp"
+ val classpathEntries = new java.util.LinkedHashSet[String]
+ // flink engine runtime jar
+ mainResource(envs).foreach(classpathEntries.add)
+ // flink sql jars
+ Paths.get(flinkHome)
+ .resolve("opt")
+ .toFile
+ .listFiles(new FilenameFilter {
+ override def accept(dir: File, name: String): Boolean = {
+ name.toLowerCase.startsWith("flink-sql-client") ||
+ name.toLowerCase.startsWith("flink-sql-gateway")
+ }
+ }).foreach(jar => classpathEntries.add(jar.getAbsolutePath))
+
+ // jars from flink lib
+ classpathEntries.add(s"$flinkHome${File.separator}lib${File.separator}*")
+
+ // classpath contains flink configurations, default to flink.home/conf
+ classpathEntries.add(envs.getOrElse("FLINK_CONF_DIR", ""))
+ // classpath contains hadoop configurations
+ val cp = System.getProperty("java.class.path")
+ // exclude kyuubi flink engine jar that has SPI for EmbeddedExecutorFactory
+ // which can't be initialized on the client side
+ val hadoopJars = cp.split(":").filter(s => !s.contains("flink"))
+ hadoopJars.foreach(classpathEntries.add)
+ val extraCp = conf.get(ENGINE_FLINK_EXTRA_CLASSPATH)
+ extraCp.foreach(classpathEntries.add)
+ if (hadoopJars.isEmpty && extraCp.isEmpty) {
+ mainResource(envs).foreach { path =>
+ val devHadoopJars = Paths.get(path).getParent
+ .resolve(s"scala-$SCALA_COMPILE_VERSION")
+ .resolve("jars")
+ if (!Files.exists(devHadoopJars)) {
+ throw new KyuubiException(s"The path $devHadoopJars does not exists. " +
+ s"Please set FLINK_HADOOP_CLASSPATH or ${ENGINE_FLINK_EXTRA_CLASSPATH.key}" +
+ s" for configuring location of hadoop client jars, etc.")
+ }
+ classpathEntries.add(s"$devHadoopJars${File.separator}*")
+ }
+ }
+ command += classpathEntries.asScala.mkString(File.pathSeparator)
+ command += "org.apache.kyuubi.engine.flink.FlinkSQLEngine"
+
+ conf.getAll.foreach { case (k, v) =>
+ command += "--conf"
+ command += s"$k=$v"
+ }
+
+ processBuilder.command(command.toList.asJava)
+ processBuilder.redirectOutput(Redirect.INHERIT)
+ processBuilder.redirectError(Redirect.INHERIT)
+
+ info(s"staring flink local engine...")
+ engineProcess = processBuilder.start()
+ }
+
+ private def startMiniCluster(): Unit = {
+ val cfg = new MiniClusterConfiguration.Builder()
+ .setConfiguration(flinkConfig)
+ .setNumSlotsPerTaskManager(1)
+ .setNumTaskManagers(2)
+ .build
+ miniCluster = new MiniCluster(cfg)
+ miniCluster.start()
+ flinkConfig.setString(RestOptions.ADDRESS, miniCluster.getRestAddress.get().getHost)
+ flinkConfig.setInteger(RestOptions.PORT, miniCluster.getRestAddress.get().getPort)
+ }
+
+ protected def getJdbcUrl: String = s"jdbc:hive2://$connectionUrl/;"
+
+ def mainResource(env: Map[String, String]): Option[String] = {
+ val module = "kyuubi-flink-sql-engine"
+ val shortName = "flink"
+ // 1. get the main resource jar for user specified config first
+ val jarName = s"${module}_$SCALA_COMPILE_VERSION-$KYUUBI_VERSION.jar"
+ conf.getOption(s"kyuubi.session.engine.$shortName.main.resource").filter {
+ userSpecified =>
+ // skip check exist if not local file.
+ val uri = new URI(userSpecified)
+ val schema = if (uri.getScheme != null) uri.getScheme else "file"
+ schema match {
+ case "file" => Files.exists(Paths.get(userSpecified))
+ case _ => true
+ }
+ }.orElse {
+ // 2. get the main resource jar from system build default
+ env.get(KYUUBI_HOME).toSeq
+ .flatMap { p =>
+ Seq(
+ Paths.get(p, "externals", "engines", shortName, jarName),
+ Paths.get(p, "externals", module, "target", jarName))
+ }
+ .find(Files.exists(_)).map(_.toAbsolutePath.toFile.getCanonicalPath)
+ }.orElse {
+ // 3. get the main resource from dev environment
+ val cwd = Utils.getCodeSourceLocation(getClass).split("externals")
+ assert(cwd.length > 1)
+ Option(Paths.get(cwd.head, "externals", module, "target", jarName))
+ .map(_.toAbsolutePath.toFile.getCanonicalPath)
+ }
+ }
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkSQLEngineOnYarn.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkSQLEngineOnYarn.scala
new file mode 100644
index 00000000000..49fb947a3ec
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkSQLEngineOnYarn.scala
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink
+
+import java.io.{File, FilenameFilter, FileWriter}
+import java.lang.ProcessBuilder.Redirect
+import java.net.URI
+import java.nio.file.{Files, Paths}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.{ArrayBuffer, ListBuffer}
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.hdfs.MiniDFSCluster
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+import org.apache.hadoop.yarn.server.MiniYARNCluster
+
+import org.apache.kyuubi.{KYUUBI_VERSION, KyuubiFunSuite, SCALA_COMPILE_VERSION, Utils}
+import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.KyuubiConf.{ENGINE_FLINK_APPLICATION_JARS, KYUUBI_HOME}
+import org.apache.kyuubi.ha.HighAvailabilityConf.HA_ADDRESSES
+import org.apache.kyuubi.zookeeper.EmbeddedZookeeper
+import org.apache.kyuubi.zookeeper.ZookeeperConf.{ZK_CLIENT_PORT, ZK_CLIENT_PORT_ADDRESS}
+
+trait WithFlinkSQLEngineOnYarn extends KyuubiFunSuite with WithFlinkTestResources {
+
+ protected def engineRefId: String
+
+ protected val conf: KyuubiConf = new KyuubiConf(false)
+
+ private var hdfsCluster: MiniDFSCluster = _
+
+ private var yarnCluster: MiniYARNCluster = _
+
+ private var zkServer: EmbeddedZookeeper = _
+
+ def withKyuubiConf: Map[String, String] = testExtraConf
+
+ private val yarnConf: YarnConfiguration = {
+ val yarnConfig = new YarnConfiguration()
+
+ // configurations copied from org.apache.flink.yarn.YarnTestBase
+ yarnConfig.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 32)
+ yarnConfig.setInt(YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB, 4096)
+
+ yarnConfig.setBoolean(YarnConfiguration.RM_SCHEDULER_INCLUDE_PORT_IN_NODE_NAME, true)
+ yarnConfig.setInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, 2)
+ yarnConfig.setInt(YarnConfiguration.RM_MAX_COMPLETED_APPLICATIONS, 2)
+ yarnConfig.setInt(YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES, 4)
+ yarnConfig.setInt(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600)
+ yarnConfig.setBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED, false)
+ // memory is overwritten in the MiniYARNCluster.
+ // so we have to change the number of cores for testing.
+ yarnConfig.setInt(YarnConfiguration.NM_VCORES, 666)
+ yarnConfig.setFloat(YarnConfiguration.NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE, 99.0f)
+ yarnConfig.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_MS, 1000)
+ yarnConfig.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 5000)
+
+ // capacity-scheduler.xml is missing in hadoop-client-minicluster so this is a workaround
+ yarnConfig.set("yarn.scheduler.capacity.root.queues", "default,four_cores_queue")
+
+ yarnConfig.setInt("yarn.scheduler.capacity.root.default.capacity", 100)
+ yarnConfig.setFloat("yarn.scheduler.capacity.root.default.user-limit-factor", 1)
+ yarnConfig.setInt("yarn.scheduler.capacity.root.default.maximum-capacity", 100)
+ yarnConfig.set("yarn.scheduler.capacity.root.default.state", "RUNNING")
+ yarnConfig.set("yarn.scheduler.capacity.root.default.acl_submit_applications", "*")
+ yarnConfig.set("yarn.scheduler.capacity.root.default.acl_administer_queue", "*")
+
+ yarnConfig.setInt("yarn.scheduler.capacity.root.four_cores_queue.maximum-capacity", 100)
+ yarnConfig.setInt("yarn.scheduler.capacity.root.four_cores_queue.maximum-applications", 10)
+ yarnConfig.setInt("yarn.scheduler.capacity.root.four_cores_queue.maximum-allocation-vcores", 4)
+ yarnConfig.setFloat("yarn.scheduler.capacity.root.four_cores_queue.user-limit-factor", 1)
+ yarnConfig.set("yarn.scheduler.capacity.root.four_cores_queue.acl_submit_applications", "*")
+ yarnConfig.set("yarn.scheduler.capacity.root.four_cores_queue.acl_administer_queue", "*")
+
+ yarnConfig.setInt("yarn.scheduler.capacity.node-locality-delay", -1)
+ // Set bind host to localhost to avoid java.net.BindException
+ yarnConfig.set(YarnConfiguration.RM_BIND_HOST, "localhost")
+ yarnConfig.set(YarnConfiguration.NM_BIND_HOST, "localhost")
+
+ yarnConfig
+ }
+
+ override def beforeAll(): Unit = {
+ zkServer = new EmbeddedZookeeper()
+ conf.set(ZK_CLIENT_PORT, 0).set(ZK_CLIENT_PORT_ADDRESS, "localhost")
+ zkServer.initialize(conf)
+ zkServer.start()
+ conf.set(HA_ADDRESSES, zkServer.getConnectString)
+
+ hdfsCluster = new MiniDFSCluster.Builder(new Configuration)
+ .numDataNodes(1)
+ .checkDataNodeAddrConfig(true)
+ .checkDataNodeHostConfig(true)
+ .build()
+
+ val hdfsServiceUrl = s"hdfs://localhost:${hdfsCluster.getNameNodePort}"
+ yarnConf.set("fs.defaultFS", hdfsServiceUrl)
+ yarnConf.addResource(hdfsCluster.getConfiguration(0))
+
+ val cp = System.getProperty("java.class.path")
+ // exclude kyuubi flink engine jar that has SPI for EmbeddedExecutorFactory
+ // which can't be initialized on the client side
+ val hadoopJars = cp.split(":").filter(s => !s.contains("flink") && !s.contains("log4j"))
+ val hadoopClasspath = hadoopJars.mkString(":")
+ yarnConf.set(YarnConfiguration.YARN_APPLICATION_CLASSPATH, hadoopClasspath)
+
+ yarnCluster = new MiniYARNCluster("flink-engine-cluster", 1, 1, 1)
+ yarnCluster.init(yarnConf)
+ yarnCluster.start()
+
+ val hadoopConfDir = Utils.createTempDir().toFile
+ val writer = new FileWriter(new File(hadoopConfDir, "core-site.xml"))
+ yarnCluster.getConfig.writeXml(writer)
+ writer.close()
+
+ val envs = scala.collection.mutable.Map[String, String]()
+ val kyuubiExternals = Utils.getCodeSourceLocation(getClass)
+ .split("externals").head
+ val flinkHome = {
+ val candidates = Paths.get(kyuubiExternals, "externals", "kyuubi-download", "target")
+ .toFile.listFiles(f => f.getName.contains("flink"))
+ if (candidates == null) None else candidates.map(_.toPath).headOption
+ }
+ if (flinkHome.isDefined) {
+ envs("FLINK_HOME") = flinkHome.get.toString
+ envs("FLINK_CONF_DIR") = Paths.get(flinkHome.get.toString, "conf").toString
+ }
+ envs("HADOOP_CLASSPATH") = hadoopClasspath
+ envs("HADOOP_CONF_DIR") = hadoopConfDir.getAbsolutePath
+
+ startFlinkEngine(envs.toMap)
+
+ super.beforeAll()
+ }
+
+ private def startFlinkEngine(envs: Map[String, String]): Unit = {
+ val processBuilder: ProcessBuilder = new ProcessBuilder
+ processBuilder.environment().putAll(envs.asJava)
+
+ conf.set(ENGINE_FLINK_APPLICATION_JARS, udfJar.getAbsolutePath)
+ val flinkExtraJars = extraFlinkJars(envs("FLINK_HOME"))
+ val command = new ArrayBuffer[String]()
+
+ command += s"${envs("FLINK_HOME")}${File.separator}bin/flink"
+ command += "run-application"
+ command += "-t"
+ command += "yarn-application"
+ command += s"-Dyarn.ship-files=${flinkExtraJars.mkString(";")}"
+ command += s"-Dyarn.application.name=kyuubi_user_flink_paul"
+ command += s"-Dyarn.tags=KYUUBI,$engineRefId"
+ command += "-Djobmanager.memory.process.size=1g"
+ command += "-Dtaskmanager.memory.process.size=1g"
+ command += "-Dcontainerized.master.env.FLINK_CONF_DIR=."
+ command += "-Dcontainerized.taskmanager.env.FLINK_CONF_DIR=."
+ command += s"-Dcontainerized.master.env.HADOOP_CONF_DIR=${envs("HADOOP_CONF_DIR")}"
+ command += s"-Dcontainerized.taskmanager.env.HADOOP_CONF_DIR=${envs("HADOOP_CONF_DIR")}"
+ command += "-Dexecution.target=yarn-application"
+ command += "-c"
+ command += "org.apache.kyuubi.engine.flink.FlinkSQLEngine"
+ command += s"${mainResource(envs).get}"
+
+ for ((k, v) <- withKyuubiConf) {
+ conf.set(k, v)
+ }
+
+ for ((k, v) <- conf.getAll) {
+ command += "--conf"
+ command += s"$k=$v"
+ }
+
+ processBuilder.command(command.toList.asJava)
+ processBuilder.redirectOutput(Redirect.INHERIT)
+ processBuilder.redirectError(Redirect.INHERIT)
+
+ info(s"staring flink yarn-application cluster for engine $engineRefId..")
+ val process = processBuilder.start()
+ process.waitFor()
+ info(s"flink yarn-application cluster for engine $engineRefId has started")
+ }
+
+ def extraFlinkJars(flinkHome: String): Array[String] = {
+ // locate flink sql jars
+ val flinkExtraJars = new ListBuffer[String]
+ val flinkSQLJars = Paths.get(flinkHome)
+ .resolve("opt")
+ .toFile
+ .listFiles(new FilenameFilter {
+ override def accept(dir: File, name: String): Boolean = {
+ name.toLowerCase.startsWith("flink-sql-client") ||
+ name.toLowerCase.startsWith("flink-sql-gateway")
+ }
+ }).map(f => f.getAbsolutePath).sorted
+ flinkExtraJars ++= flinkSQLJars
+
+ val userJars = conf.get(ENGINE_FLINK_APPLICATION_JARS)
+ userJars.foreach(jars => flinkExtraJars ++= jars.split(","))
+ flinkExtraJars.toArray
+ }
+
+ /**
+ * Copied form org.apache.kyuubi.engine.ProcBuilder
+ * The engine jar or other runnable jar containing the main method
+ */
+ def mainResource(env: Map[String, String]): Option[String] = {
+ // 1. get the main resource jar for user specified config first
+ val module = "kyuubi-flink-sql-engine"
+ val shortName = "flink"
+ val jarName = s"${module}_$SCALA_COMPILE_VERSION-$KYUUBI_VERSION.jar"
+ conf.getOption(s"kyuubi.session.engine.$shortName.main.resource").filter { userSpecified =>
+ // skip check exist if not local file.
+ val uri = new URI(userSpecified)
+ val schema = if (uri.getScheme != null) uri.getScheme else "file"
+ schema match {
+ case "file" => Files.exists(Paths.get(userSpecified))
+ case _ => true
+ }
+ }.orElse {
+ // 2. get the main resource jar from system build default
+ env.get(KYUUBI_HOME).toSeq
+ .flatMap { p =>
+ Seq(
+ Paths.get(p, "externals", "engines", shortName, jarName),
+ Paths.get(p, "externals", module, "target", jarName))
+ }
+ .find(Files.exists(_)).map(_.toAbsolutePath.toFile.getCanonicalPath)
+ }.orElse {
+ // 3. get the main resource from dev environment
+ val cwd = Utils.getCodeSourceLocation(getClass).split("externals")
+ assert(cwd.length > 1)
+ Option(Paths.get(cwd.head, "externals", module, "target", jarName))
+ .map(_.toAbsolutePath.toFile.getCanonicalPath)
+ }
+ }
+
+ override def afterAll(): Unit = {
+ super.afterAll()
+ if (yarnCluster != null) {
+ yarnCluster.stop()
+ yarnCluster = null
+ }
+ if (hdfsCluster != null) {
+ hdfsCluster.shutdown()
+ hdfsCluster = null
+ }
+ if (zkServer != null) {
+ zkServer.stop()
+ zkServer = null
+ }
+ }
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkTestResources.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkTestResources.scala
new file mode 100644
index 00000000000..3b1d65cb233
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/WithFlinkTestResources.scala
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink
+
+import java.io.File
+
+import org.apache.kyuubi.Utils
+import org.apache.kyuubi.engine.flink.util.TestUserClassLoaderJar
+
+trait WithFlinkTestResources {
+
+ protected val GENERATED_UDF_CLASS: String = "LowerUDF"
+
+ protected val GENERATED_UDF_CODE: String =
+ s"""
+ public class $GENERATED_UDF_CLASS extends org.apache.flink.table.functions.ScalarFunction {
+ public String eval(String str) {
+ return str.toLowerCase();
+ }
+ }
+ """
+
+ protected val udfJar: File = TestUserClassLoaderJar.createJarFile(
+ Utils.createTempDir("test-jar").toFile,
+ "test-classloader-udf.jar",
+ GENERATED_UDF_CLASS,
+ GENERATED_UDF_CODE)
+
+ protected val savepointDir: File = Utils.createTempDir("savepoints").toFile
+
+ protected val testExtraConf: Map[String, String] = Map(
+ "flink.pipeline.name" -> "test-job",
+ "flink.state.savepoints.dir" -> savepointDir.toURI.toString)
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationLocalSuite.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationLocalSuite.scala
new file mode 100644
index 00000000000..279cbea22a4
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationLocalSuite.scala
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink.operation
+
+import java.util.UUID
+
+import org.apache.kyuubi.{KYUUBI_VERSION, Utils}
+import org.apache.kyuubi.config.KyuubiConf._
+import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY
+import org.apache.kyuubi.engine.ShareLevel
+import org.apache.kyuubi.engine.flink.{WithDiscoveryFlinkSQLEngine, WithFlinkSQLEngineLocal}
+import org.apache.kyuubi.ha.HighAvailabilityConf.{HA_ENGINE_REF_ID, HA_NAMESPACE}
+import org.apache.kyuubi.operation.NoneMode
+
+class FlinkOperationLocalSuite extends FlinkOperationSuite
+ with WithDiscoveryFlinkSQLEngine with WithFlinkSQLEngineLocal {
+
+ protected def jdbcUrl: String = getFlinkEngineServiceUrl
+
+ override def withKyuubiConf: Map[String, String] = {
+ Map(
+ "flink.execution.target" -> "remote",
+ "flink.high-availability.cluster-id" -> "flink-mini-cluster",
+ "flink.app.name" -> "kyuubi_connection_flink_paul",
+ HA_NAMESPACE.key -> namespace,
+ HA_ENGINE_REF_ID.key -> engineRefId,
+ ENGINE_TYPE.key -> "FLINK_SQL",
+ ENGINE_SHARE_LEVEL.key -> shareLevel,
+ OPERATION_PLAN_ONLY_MODE.key -> NoneMode.name,
+ KYUUBI_SESSION_USER_KEY -> "paullin") ++ testExtraConf
+ }
+
+ override protected def engineRefId: String = UUID.randomUUID().toString
+
+ def namespace: String = "/kyuubi/flink-local-engine-test"
+
+ def shareLevel: String = ShareLevel.USER.toString
+
+ def engineType: String = "flink"
+
+ test("execute statement - kyuubi defined functions") {
+ withJdbcStatement() { statement =>
+ var resultSet = statement.executeQuery("select kyuubi_version() as kyuubi_version")
+ assert(resultSet.next())
+ assert(resultSet.getString(1) === KYUUBI_VERSION)
+
+ resultSet = statement.executeQuery("select kyuubi_engine_name() as engine_name")
+ assert(resultSet.next())
+ assert(resultSet.getString(1).equals(s"kyuubi_connection_flink_paul"))
+
+ resultSet = statement.executeQuery("select kyuubi_engine_id() as engine_id")
+ assert(resultSet.next())
+ assert(resultSet.getString(1) === "flink-mini-cluster")
+
+ resultSet = statement.executeQuery("select kyuubi_system_user() as `system_user`")
+ assert(resultSet.next())
+ assert(resultSet.getString(1) === Utils.currentUser)
+
+ resultSet = statement.executeQuery("select kyuubi_session_user() as `session_user`")
+ assert(resultSet.next())
+ assert(resultSet.getString(1) === "paullin")
+ }
+ }
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationOnYarnSuite.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationOnYarnSuite.scala
new file mode 100644
index 00000000000..401c3b0bdd0
--- /dev/null
+++ b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationOnYarnSuite.scala
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.flink.operation
+
+import java.util.UUID
+
+import org.apache.kyuubi.{KYUUBI_VERSION, Utils}
+import org.apache.kyuubi.config.KyuubiConf.{ENGINE_SHARE_LEVEL, ENGINE_TYPE}
+import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY
+import org.apache.kyuubi.engine.ShareLevel
+import org.apache.kyuubi.engine.flink.{WithDiscoveryFlinkSQLEngine, WithFlinkSQLEngineOnYarn}
+import org.apache.kyuubi.ha.HighAvailabilityConf.{HA_ENGINE_REF_ID, HA_NAMESPACE}
+
+class FlinkOperationOnYarnSuite extends FlinkOperationSuite
+ with WithDiscoveryFlinkSQLEngine with WithFlinkSQLEngineOnYarn {
+
+ protected def jdbcUrl: String = getFlinkEngineServiceUrl
+
+ override def withKyuubiConf: Map[String, String] = {
+ Map(
+ HA_NAMESPACE.key -> namespace,
+ HA_ENGINE_REF_ID.key -> engineRefId,
+ ENGINE_TYPE.key -> "FLINK_SQL",
+ ENGINE_SHARE_LEVEL.key -> shareLevel,
+ KYUUBI_SESSION_USER_KEY -> "paullin") ++ testExtraConf
+ }
+
+ override protected def engineRefId: String = UUID.randomUUID().toString
+
+ def namespace: String = "/kyuubi/flink-yarn-application-test"
+
+ def shareLevel: String = ShareLevel.USER.toString
+
+ def engineType: String = "flink"
+
+ test("execute statement - kyuubi defined functions") {
+ withJdbcStatement() { statement =>
+ var resultSet = statement.executeQuery("select kyuubi_version() as kyuubi_version")
+ assert(resultSet.next())
+ assert(resultSet.getString(1) === KYUUBI_VERSION)
+
+ resultSet = statement.executeQuery("select kyuubi_engine_name() as engine_name")
+ assert(resultSet.next())
+ assert(resultSet.getString(1).equals(s"kyuubi_user_flink_paul"))
+
+ resultSet = statement.executeQuery("select kyuubi_engine_id() as engine_id")
+ assert(resultSet.next())
+ assert(resultSet.getString(1).startsWith("application_"))
+
+ resultSet = statement.executeQuery("select kyuubi_system_user() as `system_user`")
+ assert(resultSet.next())
+ assert(resultSet.getString(1) === Utils.currentUser)
+
+ resultSet = statement.executeQuery("select kyuubi_session_user() as `session_user`")
+ assert(resultSet.next())
+ assert(resultSet.getString(1) === "paullin")
+ }
+ }
+}
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala
index 5026fd41175..8e7c35a95a4 100644
--- a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala
+++ b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/FlinkOperationSuite.scala
@@ -17,42 +17,29 @@
package org.apache.kyuubi.engine.flink.operation
+import java.nio.file.Paths
import java.sql.DatabaseMetaData
import java.util.UUID
import scala.collection.JavaConverters._
import org.apache.flink.api.common.JobID
+import org.apache.flink.configuration.PipelineOptions
import org.apache.flink.table.types.logical.LogicalTypeRoot
import org.apache.hive.service.rpc.thrift._
-import org.scalatest.concurrent.PatienceConfiguration.Timeout
-import org.scalatest.time.SpanSugar._
import org.apache.kyuubi.Utils
import org.apache.kyuubi.config.KyuubiConf._
-import org.apache.kyuubi.engine.flink.WithFlinkSQLEngine
+import org.apache.kyuubi.engine.flink.FlinkEngineUtils.FLINK_RUNTIME_VERSION
+import org.apache.kyuubi.engine.flink.WithFlinkTestResources
import org.apache.kyuubi.engine.flink.result.Constants
import org.apache.kyuubi.engine.flink.util.TestUserClassLoaderJar
-import org.apache.kyuubi.jdbc.hive.KyuubiStatement
-import org.apache.kyuubi.operation.{HiveJDBCTestHelper, NoneMode}
+import org.apache.kyuubi.jdbc.hive.{KyuubiSQLException, KyuubiStatement}
+import org.apache.kyuubi.jdbc.hive.common.TimestampTZ
+import org.apache.kyuubi.operation.HiveJDBCTestHelper
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant._
-import org.apache.kyuubi.service.ServiceState._
-class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
- override def withKyuubiConf: Map[String, String] =
- Map(OPERATION_PLAN_ONLY_MODE.key -> NoneMode.name)
-
- override protected def jdbcUrl: String =
- s"jdbc:hive2://${engine.frontendServices.head.connectionUrl}/;"
-
- ignore("release session if shared level is CONNECTION") {
- logger.info(s"jdbc url is $jdbcUrl")
- assert(engine.getServiceState == STARTED)
- withJdbcStatement() { _ => }
- eventually(Timeout(20.seconds)) {
- assert(engine.getServiceState == STOPPED)
- }
- }
+abstract class FlinkOperationSuite extends HiveJDBCTestHelper with WithFlinkTestResources {
test("get catalogs") {
withJdbcStatement() { statement =>
@@ -648,6 +635,60 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
}
}
+ test("execute statement - show/stop jobs") {
+ if (FLINK_RUNTIME_VERSION >= "1.17") {
+ withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "10"))(Map.empty) {
+ withMultipleConnectionJdbcStatement()({ statement =>
+ statement.executeQuery(
+ "create table tbl_a (a int) with (" +
+ "'connector' = 'datagen', " +
+ "'rows-per-second'='10')")
+ statement.executeQuery("create table tbl_b (a int) with ('connector' = 'blackhole')")
+ val insertResult1 = statement.executeQuery("insert into tbl_b select * from tbl_a")
+ assert(insertResult1.next())
+ val jobId1 = insertResult1.getString(1)
+
+ Thread.sleep(5000)
+
+ val showResult = statement.executeQuery("show jobs")
+ val metadata = showResult.getMetaData
+ assert(metadata.getColumnName(1) === "job id")
+ assert(metadata.getColumnType(1) === java.sql.Types.VARCHAR)
+ assert(metadata.getColumnName(2) === "job name")
+ assert(metadata.getColumnType(2) === java.sql.Types.VARCHAR)
+ assert(metadata.getColumnName(3) === "status")
+ assert(metadata.getColumnType(3) === java.sql.Types.VARCHAR)
+ assert(metadata.getColumnName(4) === "start time")
+ assert(metadata.getColumnType(4) === java.sql.Types.OTHER)
+
+ var isFound = false
+ while (showResult.next()) {
+ if (showResult.getString(1) === jobId1) {
+ isFound = true
+ assert(showResult.getString(2) === "test-job")
+ assert(showResult.getString(3) === "RUNNING")
+ assert(showResult.getObject(4).isInstanceOf[TimestampTZ])
+ }
+ }
+ assert(isFound)
+
+ val stopResult1 = statement.executeQuery(s"stop job '$jobId1'")
+ assert(stopResult1.next())
+ assert(stopResult1.getString(1) === "OK")
+
+ val insertResult2 = statement.executeQuery("insert into tbl_b select * from tbl_a")
+ assert(insertResult2.next())
+ val jobId2 = insertResult2.getString(1)
+
+ val stopResult2 = statement.executeQuery(s"stop job '$jobId2' with savepoint")
+ assert(stopResult2.getMetaData.getColumnName(1).equals("savepoint path"))
+ assert(stopResult2.next())
+ assert(Paths.get(stopResult2.getString(1)).getFileName.toString.startsWith("savepoint-"))
+ })
+ }
+ }
+ }
+
test("execute statement - select column name with dots") {
withJdbcStatement() { statement =>
val resultSet = statement.executeQuery("select 'tmp.hello'")
@@ -755,6 +796,23 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
}
}
+ test("execute statement - select timestamp with local time zone") {
+ withJdbcStatement() { statement =>
+ statement.executeQuery("CREATE VIEW T1 AS SELECT TO_TIMESTAMP_LTZ(4001, 3)")
+ statement.executeQuery("SET 'table.local-time-zone' = 'UTC'")
+ val resultSetUTC = statement.executeQuery("SELECT * FROM T1")
+ val metaData = resultSetUTC.getMetaData
+ assert(metaData.getColumnType(1) === java.sql.Types.OTHER)
+ assert(resultSetUTC.next())
+ assert(resultSetUTC.getString(1) === "1970-01-01 00:00:04.001 UTC")
+
+ statement.executeQuery("SET 'table.local-time-zone' = 'America/Los_Angeles'")
+ val resultSetPST = statement.executeQuery("SELECT * FROM T1")
+ assert(resultSetPST.next())
+ assert(resultSetPST.getString(1) === "1969-12-31 16:00:04.001 America/Los_Angeles")
+ }
+ }
+
test("execute statement - select time") {
withJdbcStatement() { statement =>
val resultSet =
@@ -775,7 +833,7 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
val metaData = resultSet.getMetaData
assert(metaData.getColumnType(1) === java.sql.Types.ARRAY)
assert(resultSet.next())
- val expected = "[v1,v2,v3]"
+ val expected = "[\"v1\",\"v2\",\"v3\"]"
assert(resultSet.getObject(1).toString == expected)
}
}
@@ -784,7 +842,8 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
withJdbcStatement() { statement =>
val resultSet = statement.executeQuery("select map ['k1', 'v1', 'k2', 'v2']")
assert(resultSet.next())
- assert(resultSet.getString(1) == "{k1=v1, k2=v2}")
+ assert(List("{k1=v1, k2=v2}", "{k2=v2, k1=v1}")
+ .contains(resultSet.getString(1)))
val metaData = resultSet.getMetaData
assert(metaData.getColumnType(1) === java.sql.Types.JAVA_OBJECT)
}
@@ -794,7 +853,7 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
withJdbcStatement() { statement =>
val resultSet = statement.executeQuery("select (1, '2', true)")
assert(resultSet.next())
- val expected = """{INT NOT NULL:1,CHAR(1) NOT NULL:2,BOOLEAN NOT NULL:true}"""
+ val expected = """{INT NOT NULL:1,CHAR(1) NOT NULL:"2",BOOLEAN NOT NULL:true}"""
assert(resultSet.getString(1) == expected)
val metaData = resultSet.getMetaData
assert(metaData.getColumnType(1) === java.sql.Types.STRUCT)
@@ -812,6 +871,16 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
}
}
+ test("execute statement - select varbinary") {
+ withJdbcStatement() { statement =>
+ val resultSet = statement.executeQuery("select cast('kyuubi' as varbinary)")
+ assert(resultSet.next())
+ assert(resultSet.getString(1) == "kyuubi")
+ val metaData = resultSet.getMetaData
+ assert(metaData.getColumnType(1) === java.sql.Types.BINARY)
+ }
+ }
+
test("execute statement - select float") {
withJdbcStatement() { statement =>
val resultSet = statement.executeQuery("SELECT cast(0.1 as float)")
@@ -956,27 +1025,50 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
}
}
- test("execute statement - insert into") {
+ test("execute statement - batch insert into") {
withMultipleConnectionJdbcStatement() { statement =>
statement.executeQuery("create table tbl_a (a int) with ('connector' = 'blackhole')")
val resultSet = statement.executeQuery("insert into tbl_a select 1")
val metadata = resultSet.getMetaData
- assert(metadata.getColumnName(1) == "default_catalog.default_database.tbl_a")
- assert(metadata.getColumnType(1) == java.sql.Types.BIGINT)
+ assert(metadata.getColumnName(1) === "job id")
+ assert(metadata.getColumnType(1) === java.sql.Types.VARCHAR)
assert(resultSet.next())
- assert(resultSet.getLong(1) == -1L)
+ assert(resultSet.getString(1).length == 32)
}
}
+ test("execute statement - streaming insert into") {
+ withMultipleConnectionJdbcStatement()({ statement =>
+ // Flink currently doesn't support stop job statement, thus use a finite stream
+ statement.executeQuery(
+ "create table tbl_a (a int) with (" +
+ "'connector' = 'datagen', " +
+ "'rows-per-second'='10', " +
+ "'number-of-rows'='100')")
+ statement.executeQuery("create table tbl_b (a int) with ('connector' = 'blackhole')")
+ val resultSet = statement.executeQuery("insert into tbl_b select * from tbl_a")
+ val metadata = resultSet.getMetaData
+ assert(metadata.getColumnName(1) === "job id")
+ assert(metadata.getColumnType(1) === java.sql.Types.VARCHAR)
+ assert(resultSet.next())
+ val jobId = resultSet.getString(1)
+ assert(jobId.length == 32)
+
+ if (FLINK_RUNTIME_VERSION >= "1.17") {
+ val stopResult = statement.executeQuery(s"stop job '$jobId'")
+ assert(stopResult.next())
+ assert(stopResult.getString(1) === "OK")
+ }
+ })
+ }
+
test("execute statement - set properties") {
withMultipleConnectionJdbcStatement() { statement =>
val resultSet = statement.executeQuery("set table.dynamic-table-options.enabled = true")
val metadata = resultSet.getMetaData
- assert(metadata.getColumnName(1) == "key")
- assert(metadata.getColumnName(2) == "value")
+ assert(metadata.getColumnName(1) == "result")
assert(resultSet.next())
- assert(resultSet.getString(1) == "table.dynamic-table-options.enabled")
- assert(resultSet.getString(2) == "true")
+ assert(resultSet.getString(1) == "OK")
}
}
@@ -991,16 +1083,17 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
}
test("execute statement - reset property") {
+ val originalName = "test-job" // defined in WithFlinkTestResource
withMultipleConnectionJdbcStatement() { statement =>
- statement.executeQuery("set pipeline.jars = my.jar")
- statement.executeQuery("reset pipeline.jars")
+ statement.executeQuery(s"set ${PipelineOptions.NAME.key()} = wrong-name")
+ statement.executeQuery(s"reset ${PipelineOptions.NAME.key()}")
val resultSet = statement.executeQuery("set")
// Flink does not support set key without value currently,
// thus read all rows to find the desired one
var success = false
while (resultSet.next()) {
- if (resultSet.getString(1) == "pipeline.jars" &&
- !resultSet.getString(2).contains("my.jar")) {
+ if (resultSet.getString(1) == PipelineOptions.NAME.key() &&
+ resultSet.getString(2).equals(originalName)) {
success = true
}
}
@@ -1043,7 +1136,8 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
test("ensure result max rows") {
withSessionConf()(Map(ENGINE_FLINK_MAX_ROWS.key -> "200"))(Map.empty) {
withJdbcStatement() { statement =>
- statement.execute("create table tbl_src (a bigint) with ('connector' = 'datagen')")
+ statement.execute("create table tbl_src (a bigint) with (" +
+ "'connector' = 'datagen', 'number-of-rows' = '1000')")
val resultSet = statement.executeQuery(s"select a from tbl_src")
var rows = 0
while (resultSet.next()) {
@@ -1054,7 +1148,31 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
}
}
- test("execute statement - add/remove/show jar") {
+ test("execute statement - add/show jar") {
+ val jarName = s"newly-added-${UUID.randomUUID()}.jar"
+ val newJar = TestUserClassLoaderJar.createJarFile(
+ Utils.createTempDir("add-jar-test").toFile,
+ jarName,
+ GENERATED_UDF_CLASS,
+ GENERATED_UDF_CODE).toPath
+
+ withMultipleConnectionJdbcStatement()({ statement =>
+ statement.execute(s"add jar '$newJar'")
+
+ val showJarsResultAdded = statement.executeQuery("show jars")
+ var exists = false
+ while (showJarsResultAdded.next()) {
+ if (showJarsResultAdded.getString(1).contains(jarName)) {
+ exists = true
+ }
+ }
+ assert(exists)
+ })
+ }
+
+ // ignored because Flink gateway doesn't support remove-jar statements
+ // see org.apache.flink.table.gateway.service.operation.OperationExecutor#callRemoveJar(..)
+ ignore("execute statement - remove jar") {
val jarName = s"newly-added-${UUID.randomUUID()}.jar"
val newJar = TestUserClassLoaderJar.createJarFile(
Utils.createTempDir("add-jar-test").toFile,
@@ -1124,9 +1242,25 @@ class FlinkOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
assert(stmt.asInstanceOf[KyuubiStatement].getQueryId === null)
stmt.executeQuery("insert into tbl_a values (1)")
val queryId = stmt.asInstanceOf[KyuubiStatement].getQueryId
- assert(queryId !== null)
- // parse the string to check if it's valid Flink job id
- assert(JobID.fromHexString(queryId) !== null)
+ // Flink 1.16 doesn't support query id via ResultFetcher
+ if (FLINK_RUNTIME_VERSION >= "1.17") {
+ assert(queryId !== null)
+ // parse the string to check if it's valid Flink job id
+ assert(JobID.fromHexString(queryId) !== null)
+ }
}
}
+
+ test("test result fetch timeout") {
+ val exception = intercept[KyuubiSQLException](
+ withSessionConf()(Map(ENGINE_FLINK_FETCH_TIMEOUT.key -> "60000"))() {
+ withJdbcStatement("tbl_a") { stmt =>
+ stmt.executeQuery("create table tbl_a (a int) " +
+ "with ('connector' = 'datagen', 'rows-per-second'='0')")
+ val resultSet = stmt.executeQuery("select * from tbl_a")
+ while (resultSet.next()) {}
+ }
+ })
+ assert(exception.getMessage === "Futures timed out after [60000 milliseconds]")
+ }
}
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/PlanOnlyOperationSuite.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/PlanOnlyOperationSuite.scala
index 1194f3582b1..17c49464fae 100644
--- a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/PlanOnlyOperationSuite.scala
+++ b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/operation/PlanOnlyOperationSuite.scala
@@ -18,21 +18,33 @@
package org.apache.kyuubi.engine.flink.operation
import java.sql.Statement
+import java.util.UUID
import org.apache.kyuubi.config.KyuubiConf
-import org.apache.kyuubi.engine.flink.WithFlinkSQLEngine
+import org.apache.kyuubi.engine.flink.{WithDiscoveryFlinkSQLEngine, WithFlinkSQLEngineLocal}
+import org.apache.kyuubi.ha.HighAvailabilityConf.{HA_ENGINE_REF_ID, HA_NAMESPACE}
import org.apache.kyuubi.operation.{AnalyzeMode, ExecutionMode, HiveJDBCTestHelper, ParseMode, PhysicalMode}
-class PlanOnlyOperationSuite extends WithFlinkSQLEngine with HiveJDBCTestHelper {
+class PlanOnlyOperationSuite extends WithFlinkSQLEngineLocal
+ with HiveJDBCTestHelper with WithDiscoveryFlinkSQLEngine {
+
+ override protected def engineRefId: String = UUID.randomUUID().toString
+
+ override protected def namespace: String = "/kyuubi/flink-plan-only-test"
+
+ def engineType: String = "flink"
override def withKyuubiConf: Map[String, String] =
Map(
+ "flink.execution.target" -> "remote",
+ HA_NAMESPACE.key -> namespace,
+ HA_ENGINE_REF_ID.key -> engineRefId,
+ KyuubiConf.ENGINE_TYPE.key -> "FLINK_SQL",
KyuubiConf.ENGINE_SHARE_LEVEL.key -> "user",
KyuubiConf.OPERATION_PLAN_ONLY_MODE.key -> ParseMode.name,
- KyuubiConf.ENGINE_SHARE_LEVEL_SUBDOMAIN.key -> "plan-only")
+ KyuubiConf.ENGINE_SHARE_LEVEL_SUBDOMAIN.key -> "plan-only") ++ testExtraConf
- override protected def jdbcUrl: String =
- s"jdbc:hive2://${engine.frontendServices.head.connectionUrl}/;"
+ override protected def jdbcUrl: String = getFlinkEngineServiceUrl
test("Plan only operation with system defaults") {
withJdbcStatement() { statement =>
diff --git a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/result/ResultSetSuite.scala b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/result/ResultSetSuite.scala
index 9190456b32b..9ee5c658bc9 100644
--- a/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/result/ResultSetSuite.scala
+++ b/externals/kyuubi-flink-sql-engine/src/test/scala/org/apache/kyuubi/engine/flink/result/ResultSetSuite.scala
@@ -17,6 +17,8 @@
package org.apache.kyuubi.engine.flink.result
+import java.time.ZoneId
+
import org.apache.flink.table.api.{DataTypes, ResultKind}
import org.apache.flink.table.catalog.Column
import org.apache.flink.table.data.StringData
@@ -44,9 +46,10 @@ class ResultSetSuite extends KyuubiFunSuite {
.data(rowsNew)
.build
- assert(RowSet.toRowBaseSet(rowsNew, resultSetNew)
- === RowSet.toRowBaseSet(rowsOld, resultSetOld))
- assert(RowSet.toColumnBasedSet(rowsNew, resultSetNew)
- === RowSet.toColumnBasedSet(rowsOld, resultSetOld))
+ val timeZone = ZoneId.of("America/Los_Angeles")
+ assert(RowSet.toRowBaseSet(rowsNew, resultSetNew, timeZone)
+ === RowSet.toRowBaseSet(rowsOld, resultSetOld, timeZone))
+ assert(RowSet.toColumnBasedSet(rowsNew, resultSetNew, timeZone)
+ === RowSet.toColumnBasedSet(rowsOld, resultSetOld, timeZone))
}
}
diff --git a/externals/kyuubi-hive-sql-engine/pom.xml b/externals/kyuubi-hive-sql-engine/pom.xml
index 0319d3dd2f3..caed7e27c37 100644
--- a/externals/kyuubi-hive-sql-engine/pom.xml
+++ b/externals/kyuubi-hive-sql-engine/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../pom.xml
- kyuubi-hive-sql-engine_2.12
+ kyuubi-hive-sql-engine_${scala.binary.version}jarKyuubi Project Engine Hive SQLhttps://kyuubi.apache.org/
@@ -163,6 +163,12 @@
HikariCPtest
+
+
+ com.vladsch.flexmark
+ flexmark-all
+ test
+
@@ -179,12 +185,7 @@
com.fasterxml.jackson.core:jackson-corecom.fasterxml.jackson.core:jackson-databindcom.fasterxml.jackson.module:jackson-module-scala_${scala.binary.version}
- org.apache.kyuubi:kyuubi-common_${scala.binary.version}
- org.apache.kyuubi:kyuubi-events_${scala.binary.version}
- org.apache.kyuubi:kyuubi-ha_${scala.binary.version}
- org.apache.curator:curator-client
- org.apache.curator:curator-framework
- org.apache.curator:curator-recipes
+ org.apache.kyuubi:*
@@ -205,15 +206,6 @@
-
-
- org.apache.curator
- ${kyuubi.shade.packageName}.org.apache.curator
-
- org.apache.curator.**
-
-
-
diff --git a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/HiveSQLEngine.scala b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/HiveSQLEngine.scala
index 839da710e3e..3cc426c435a 100644
--- a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/HiveSQLEngine.scala
+++ b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/HiveSQLEngine.scala
@@ -18,6 +18,7 @@
package org.apache.kyuubi.engine.hive
import java.security.PrivilegedExceptionAction
+import java.time.Instant
import scala.util.control.NonFatal
@@ -65,6 +66,7 @@ object HiveSQLEngine extends Logging {
var currentEngine: Option[HiveSQLEngine] = None
val hiveConf = new HiveConf()
val kyuubiConf = new KyuubiConf()
+ val user = UserGroupInformation.getCurrentUser.getShortUserName
def startEngine(): Unit = {
try {
@@ -97,6 +99,8 @@ object HiveSQLEngine extends Logging {
}
val engine = new HiveSQLEngine()
+ val appName = s"kyuubi_${user}_hive_${Instant.now}"
+ hiveConf.setIfUnset("hive.engine.name", appName)
info(s"Starting ${engine.getName}")
engine.initialize(kyuubiConf)
EventBus.post(HiveEngineEvent(engine))
diff --git a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/operation/HiveOperation.scala b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/operation/HiveOperation.scala
index 81affdff3a3..9759fa00be4 100644
--- a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/operation/HiveOperation.scala
+++ b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/operation/HiveOperation.scala
@@ -21,9 +21,10 @@ import java.util.concurrent.Future
import org.apache.hive.service.cli.operation.{Operation, OperationManager}
import org.apache.hive.service.cli.session.{HiveSession, SessionManager => HiveSessionManager}
-import org.apache.hive.service.rpc.thrift.{TGetResultSetMetadataResp, TRowSet}
+import org.apache.hive.service.rpc.thrift.{TFetchResultsResp, TGetResultSetMetadataResp}
import org.apache.kyuubi.KyuubiSQLException
+import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY
import org.apache.kyuubi.engine.hive.session.HiveSessionImpl
import org.apache.kyuubi.operation.{AbstractOperation, FetchOrientation, OperationState, OperationStatus}
import org.apache.kyuubi.operation.FetchOrientation.FetchOrientation
@@ -43,12 +44,14 @@ abstract class HiveOperation(session: Session) extends AbstractOperation(session
override def beforeRun(): Unit = {
setState(OperationState.RUNNING)
+ hive.getHiveConf.set(KYUUBI_SESSION_USER_KEY, session.user)
}
override def afterRun(): Unit = {
- state.synchronized {
+ withLockRequired {
if (!isTerminalState(state)) {
setState(OperationState.FINISHED)
+ hive.getHiveConf.unset(KYUUBI_SESSION_USER_KEY)
}
}
}
@@ -92,22 +95,31 @@ abstract class HiveOperation(session: Session) extends AbstractOperation(session
resp
}
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
val tOrder = FetchOrientation.toTFetchOrientation(order)
val hiveOrder = org.apache.hive.service.cli.FetchOrientation.getFetchOrientation(tOrder)
val rowSet = internalHiveOperation.getNextRowSet(hiveOrder, rowSetSize)
- rowSet.toTRowSet
+ val resp = new TFetchResultsResp(OK_STATUS)
+ resp.setResults(rowSet.toTRowSet)
+ resp.setHasMoreRows(false)
+ resp
}
- def getOperationLogRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
+ def getOperationLogRowSet(order: FetchOrientation, rowSetSize: Int): TFetchResultsResp = {
val tOrder = FetchOrientation.toTFetchOrientation(order)
val hiveOrder = org.apache.hive.service.cli.FetchOrientation.getFetchOrientation(tOrder)
val handle = internalHiveOperation.getHandle
- delegatedOperationManager.getOperationLogRowSet(
+ val rowSet = delegatedOperationManager.getOperationLogRowSet(
handle,
hiveOrder,
rowSetSize,
hive.getHiveConf).toTRowSet
+ val resp = new TFetchResultsResp(OK_STATUS)
+ resp.setResults(rowSet)
+ resp.setHasMoreRows(false)
+ resp
}
override def isTimedOut: Boolean = internalHiveOperation.isTimedOut(System.currentTimeMillis)
diff --git a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/operation/HiveOperationManager.scala b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/operation/HiveOperationManager.scala
index 0762a2938e0..4e41e742e0b 100644
--- a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/operation/HiveOperationManager.scala
+++ b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/operation/HiveOperationManager.scala
@@ -20,7 +20,7 @@ package org.apache.kyuubi.engine.hive.operation
import java.util.List
import org.apache.hadoop.hive.conf.HiveConf.ConfVars
-import org.apache.hive.service.rpc.thrift.TRowSet
+import org.apache.hive.service.rpc.thrift.TFetchResultsResp
import org.apache.kyuubi.config.KyuubiConf._
import org.apache.kyuubi.engine.hive.session.HiveSessionImpl
@@ -154,7 +154,7 @@ class HiveOperationManager() extends OperationManager("HiveOperationManager") {
override def getOperationLogRowSet(
opHandle: OperationHandle,
order: FetchOrientation,
- maxRows: Int): TRowSet = {
+ maxRows: Int): TFetchResultsResp = {
val operation = getOperation(opHandle).asInstanceOf[HiveOperation]
operation.getOperationLogRowSet(order, maxRows)
}
diff --git a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/session/HiveSessionImpl.scala b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/session/HiveSessionImpl.scala
index 3b85f94dfb9..5069b13798c 100644
--- a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/session/HiveSessionImpl.scala
+++ b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/session/HiveSessionImpl.scala
@@ -27,6 +27,7 @@ import org.apache.hive.service.rpc.thrift.{TGetInfoType, TGetInfoValue, TProtoco
import org.apache.kyuubi.KyuubiSQLException
import org.apache.kyuubi.engine.hive.events.HiveSessionEvent
+import org.apache.kyuubi.engine.hive.udf.KDFRegistry
import org.apache.kyuubi.events.EventBus
import org.apache.kyuubi.operation.{Operation, OperationHandle}
import org.apache.kyuubi.session.{AbstractSession, SessionHandle, SessionManager}
@@ -48,6 +49,7 @@ class HiveSessionImpl(
val confClone = new HashMap[String, String]()
confClone.putAll(conf.asJava) // pass conf.asScala not support `put` method
hive.open(confClone)
+ KDFRegistry.registerAll()
EventBus.post(sessionEvent)
}
diff --git a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/udf/KDFRegistry.scala b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/udf/KDFRegistry.scala
new file mode 100644
index 00000000000..5ff468b7782
--- /dev/null
+++ b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/udf/KDFRegistry.scala
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.hive.udf
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.hive.ql.exec.{FunctionRegistry, UDFArgumentLengthException}
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.{PrimitiveObjectInspectorFactory, StringObjectInspector}
+
+import org.apache.kyuubi.{KYUUBI_VERSION, Utils}
+import org.apache.kyuubi.config.KyuubiReservedKeys.{KYUUBI_ENGINE_ID, KYUUBI_SESSION_USER_KEY}
+
+object KDFRegistry {
+
+ @transient
+ val registeredFunctions = new ArrayBuffer[KyuubiDefinedFunction]()
+
+ val kyuubi_version: KyuubiDefinedFunction = create(
+ "kyuubi_version",
+ new KyuubiVersionFunction,
+ "Return the version of Kyuubi Server",
+ "string",
+ "1.8.0")
+
+ val engine_name: KyuubiDefinedFunction = create(
+ "engine_name",
+ new EngineNameFunction,
+ "Return the name of engine",
+ "string",
+ "1.8.0")
+
+ val engine_id: KyuubiDefinedFunction = create(
+ "engine_id",
+ new EngineIdFunction,
+ "Return the id of engine",
+ "string",
+ "1.8.0")
+
+ val system_user: KyuubiDefinedFunction = create(
+ "system_user",
+ new SystemUserFunction,
+ "Return the system user",
+ "string",
+ "1.8.0")
+
+ val session_user: KyuubiDefinedFunction = create(
+ "session_user",
+ new SessionUserFunction,
+ "Return the session user",
+ "string",
+ "1.8.0")
+
+ def create(
+ name: String,
+ udf: GenericUDF,
+ description: String,
+ returnType: String,
+ since: String): KyuubiDefinedFunction = {
+ val kdf = KyuubiDefinedFunction(name, udf, description, returnType, since)
+ registeredFunctions += kdf
+ kdf
+ }
+
+ def registerAll(): Unit = {
+ for (func <- registeredFunctions) {
+ FunctionRegistry.registerTemporaryUDF(func.name, func.udf.getClass)
+ }
+ }
+}
+
+class KyuubiVersionFunction() extends GenericUDF {
+ private val returnOI: StringObjectInspector =
+ PrimitiveObjectInspectorFactory.javaStringObjectInspector
+ override def initialize(arguments: Array[ObjectInspector]): ObjectInspector = {
+ if (arguments.length != 0) {
+ throw new UDFArgumentLengthException("The function kyuubi_version() takes no arguments, got "
+ + arguments.length)
+ }
+ returnOI
+ }
+
+ override def evaluate(arguments: Array[GenericUDF.DeferredObject]): AnyRef = KYUUBI_VERSION
+
+ override def getDisplayString(children: Array[String]): String = "kyuubi_version()"
+}
+
+class EngineNameFunction() extends GenericUDF {
+ private val returnOI: StringObjectInspector =
+ PrimitiveObjectInspectorFactory.javaStringObjectInspector
+ override def initialize(arguments: Array[ObjectInspector]): ObjectInspector = {
+ if (arguments.length != 0) {
+ throw new UDFArgumentLengthException("The function engine_name() takes no arguments, got "
+ + arguments.length)
+ }
+ returnOI
+ }
+ override def evaluate(arguments: Array[GenericUDF.DeferredObject]): AnyRef =
+ SessionState.get.getConf.get("hive.engine.name", "")
+ override def getDisplayString(children: Array[String]): String = "engine_name()"
+}
+
+class EngineIdFunction() extends GenericUDF {
+ private val returnOI: StringObjectInspector =
+ PrimitiveObjectInspectorFactory.javaStringObjectInspector
+ override def initialize(arguments: Array[ObjectInspector]): ObjectInspector = {
+ if (arguments.length != 0) {
+ throw new UDFArgumentLengthException("The function engine_id() takes no arguments, got "
+ + arguments.length)
+ }
+ returnOI
+ }
+
+ override def evaluate(arguments: Array[GenericUDF.DeferredObject]): AnyRef =
+ SessionState.get.getConf.get(KYUUBI_ENGINE_ID, "")
+
+ override def getDisplayString(children: Array[String]): String = "engine_id()"
+}
+
+class SystemUserFunction() extends GenericUDF {
+ private val returnOI: StringObjectInspector =
+ PrimitiveObjectInspectorFactory.javaStringObjectInspector
+ override def initialize(arguments: Array[ObjectInspector]): ObjectInspector = {
+ if (arguments.length != 0) {
+ throw new UDFArgumentLengthException("The function system_user() takes no arguments, got "
+ + arguments.length)
+ }
+ returnOI
+ }
+
+ override def evaluate(arguments: Array[GenericUDF.DeferredObject]): AnyRef = Utils.currentUser
+
+ override def getDisplayString(children: Array[String]): String = "system_user()"
+}
+
+class SessionUserFunction() extends GenericUDF {
+ private val returnOI: StringObjectInspector =
+ PrimitiveObjectInspectorFactory.javaStringObjectInspector
+ override def initialize(arguments: Array[ObjectInspector]): ObjectInspector = {
+ if (arguments.length != 0) {
+ throw new UDFArgumentLengthException("The function session_user() takes no arguments, got "
+ + arguments.length)
+ }
+ returnOI
+ }
+
+ override def evaluate(arguments: Array[GenericUDF.DeferredObject]): AnyRef = {
+ SessionState.get.getConf.get(KYUUBI_SESSION_USER_KEY, "")
+ }
+
+ override def getDisplayString(children: Array[String]): String = "session_user()"
+}
diff --git a/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/udf/KyuubiDefinedFunction.scala b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/udf/KyuubiDefinedFunction.scala
new file mode 100644
index 00000000000..ee91a804e1f
--- /dev/null
+++ b/externals/kyuubi-hive-sql-engine/src/main/scala/org/apache/kyuubi/engine/hive/udf/KyuubiDefinedFunction.scala
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.hive.udf
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF
+
+/**
+ * A wrapper for Hive's [[UserDefinedFunction]]
+ *
+ * @param name function name
+ * @param udf user-defined function
+ * @param description function description
+ */
+case class KyuubiDefinedFunction(
+ name: String,
+ udf: GenericUDF,
+ description: String,
+ returnType: String,
+ since: String)
diff --git a/externals/kyuubi-hive-sql-engine/src/test/scala/org/apache/kyuubi/engine/hive/operation/HiveOperationSuite.scala b/externals/kyuubi-hive-sql-engine/src/test/scala/org/apache/kyuubi/engine/hive/operation/HiveOperationSuite.scala
index f949ec37ab7..eb10e0b4144 100644
--- a/externals/kyuubi-hive-sql-engine/src/test/scala/org/apache/kyuubi/engine/hive/operation/HiveOperationSuite.scala
+++ b/externals/kyuubi-hive-sql-engine/src/test/scala/org/apache/kyuubi/engine/hive/operation/HiveOperationSuite.scala
@@ -19,7 +19,7 @@ package org.apache.kyuubi.engine.hive.operation
import org.apache.commons.lang3.{JavaVersion, SystemUtils}
-import org.apache.kyuubi.{HiveEngineTests, Utils}
+import org.apache.kyuubi.{HiveEngineTests, KYUUBI_VERSION, Utils}
import org.apache.kyuubi.engine.hive.HiveSQLEngine
import org.apache.kyuubi.jdbc.hive.KyuubiStatement
@@ -49,4 +49,20 @@ class HiveOperationSuite extends HiveEngineTests {
assert(kyuubiStatement.getQueryId != null)
}
}
+
+ test("kyuubi defined function - kyuubi_version") {
+ withJdbcStatement("hive_engine_test") { statement =>
+ val rs = statement.executeQuery("SELECT kyuubi_version()")
+ assert(rs.next())
+ assert(rs.getString(1) == KYUUBI_VERSION)
+ }
+ }
+
+ test("kyuubi defined function - engine_name") {
+ withJdbcStatement("hive_engine_test") { statement =>
+ val rs = statement.executeQuery("SELECT engine_name()")
+ assert(rs.next())
+ assert(rs.getString(1).nonEmpty)
+ }
+ }
}
diff --git a/externals/kyuubi-hive-sql-engine/src/test/scala/org/apache/kyuubi/engine/hive/udf/KyuubiDefinedFunctionSuite.scala b/externals/kyuubi-hive-sql-engine/src/test/scala/org/apache/kyuubi/engine/hive/udf/KyuubiDefinedFunctionSuite.scala
new file mode 100644
index 00000000000..08cb143e04a
--- /dev/null
+++ b/externals/kyuubi-hive-sql-engine/src/test/scala/org/apache/kyuubi/engine/hive/udf/KyuubiDefinedFunctionSuite.scala
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.hive.udf
+
+import java.nio.file.Paths
+
+import org.apache.kyuubi.{KyuubiFunSuite, MarkdownBuilder, Utils}
+import org.apache.kyuubi.util.GoldenFileUtils._
+
+/**
+ * End-to-end test cases for configuration doc file
+ * The golden result file is "docs/extensions/engines/hive/functions.md".
+ *
+ * To run the entire test suite:
+ * {{{
+ * KYUUBI_UPDATE=0 dev/gen/gen_hive_kdf_docs.sh
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ * dev/gen/gen_hive_kdf_docs.sh
+ * }}}
+ */
+class KyuubiDefinedFunctionSuite extends KyuubiFunSuite {
+
+ private val kyuubiHome: String = Utils.getCodeSourceLocation(getClass)
+ .split("kyuubi-hive-sql-engine")(0)
+ private val markdown =
+ Paths.get(kyuubiHome, "..", "docs", "extensions", "engines", "hive", "functions.md")
+ .toAbsolutePath
+
+ test("verify or update kyuubi hive sql functions") {
+ val builder = MarkdownBuilder(licenced = true, getClass.getName)
+
+ builder += "# Auxiliary SQL Functions" +=
+ """Kyuubi provides several auxiliary SQL functions as supplement to Hive's
+ | [Built-in Functions](https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#
+ |LanguageManualUDF-Built-inFunctions)""" ++=
+ """
+ | Name | Description | Return Type | Since
+ | --- | --- | --- | ---
+ |"""
+ KDFRegistry.registeredFunctions.foreach { func =>
+ builder += s"${func.name} | ${func.description} | ${func.returnType} | ${func.since}"
+ }
+
+ verifyOrRegenerateGoldenFile(markdown, builder.toMarkdown, "dev/gen/gen_hive_kdf_docs.sh")
+ }
+}
diff --git a/externals/kyuubi-jdbc-engine/pom.xml b/externals/kyuubi-jdbc-engine/pom.xml
index 4bcc4fb601f..3c21fed570f 100644
--- a/externals/kyuubi-jdbc-engine/pom.xml
+++ b/externals/kyuubi-jdbc-engine/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../pom.xml
- kyuubi-jdbc-engine_2.12
+ kyuubi-jdbc-engine_${scala.binary.version}jarKyuubi Project Engine JDBChttps://kyuubi.apache.org/
diff --git a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/JdbcSQLEngine.scala b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/JdbcSQLEngine.scala
index 618098f31b9..6e0647f6c7a 100644
--- a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/JdbcSQLEngine.scala
+++ b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/JdbcSQLEngine.scala
@@ -19,7 +19,9 @@ package org.apache.kyuubi.engine.jdbc
import org.apache.kyuubi.{Logging, Utils}
import org.apache.kyuubi.Utils.{addShutdownHook, JDBC_ENGINE_SHUTDOWN_PRIORITY}
import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.KyuubiConf.ENGINE_JDBC_INITIALIZE_SQL
import org.apache.kyuubi.engine.jdbc.JdbcSQLEngine.currentEngine
+import org.apache.kyuubi.engine.jdbc.util.KyuubiJdbcUtils
import org.apache.kyuubi.ha.HighAvailabilityConf.HA_ZK_CONN_RETRY_POLICY
import org.apache.kyuubi.ha.client.RetryPolicies
import org.apache.kyuubi.service.Serverable
@@ -71,6 +73,8 @@ object JdbcSQLEngine extends Logging {
kyuubiConf.setIfMissing(HA_ZK_CONN_RETRY_POLICY, RetryPolicies.N_TIME.toString)
startEngine()
+
+ KyuubiJdbcUtils.initializeJdbcSession(kyuubiConf, kyuubiConf.get(ENGINE_JDBC_INITIALIZE_SQL))
} catch {
case t: Throwable if currentEngine.isDefined =>
currentEngine.foreach { engine =>
diff --git a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/connection/ConnectionProvider.scala b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/connection/ConnectionProvider.scala
index 798c92fbe41..cb6e4b6c551 100644
--- a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/connection/ConnectionProvider.scala
+++ b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/connection/ConnectionProvider.scala
@@ -16,26 +16,25 @@
*/
package org.apache.kyuubi.engine.jdbc.connection
-import java.sql.{Connection, DriverManager}
-import java.util.ServiceLoader
-
-import scala.collection.mutable.ArrayBuffer
+import java.sql.{Connection, Driver, DriverManager}
import org.apache.kyuubi.Logging
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.config.KyuubiConf.{ENGINE_JDBC_CONNECTION_PROVIDER, ENGINE_JDBC_CONNECTION_URL, ENGINE_JDBC_DRIVER_CLASS}
+import org.apache.kyuubi.util.reflect.DynClasses
+import org.apache.kyuubi.util.reflect.ReflectUtils._
abstract class AbstractConnectionProvider extends Logging {
protected val providers = loadProviders()
def getProviderClass(kyuubiConf: KyuubiConf): String = {
- val specifiedDriverClass = kyuubiConf.get(ENGINE_JDBC_DRIVER_CLASS)
- specifiedDriverClass.foreach(Class.forName)
-
- specifiedDriverClass.getOrElse {
+ val driverClass: Class[_ <: Driver] = Option(
+ DynClasses.builder().impl(kyuubiConf.get(ENGINE_JDBC_DRIVER_CLASS).get)
+ .orNull().build[Driver]()).getOrElse {
val url = kyuubiConf.get(ENGINE_JDBC_CONNECTION_URL).get
- DriverManager.getDriver(url).getClass.getCanonicalName
+ DriverManager.getDriver(url).getClass
}
+ driverClass.getCanonicalName
}
def create(kyuubiConf: KyuubiConf): Connection = {
@@ -69,27 +68,12 @@ abstract class AbstractConnectionProvider extends Logging {
selectedProvider.getConnection(kyuubiConf)
}
- def loadProviders(): Seq[JdbcConnectionProvider] = {
- val loader = ServiceLoader.load(
- classOf[JdbcConnectionProvider],
- Thread.currentThread().getContextClassLoader)
- val providers = ArrayBuffer[JdbcConnectionProvider]()
-
- val iterator = loader.iterator()
- while (iterator.hasNext) {
- try {
- val provider = iterator.next()
+ def loadProviders(): Seq[JdbcConnectionProvider] =
+ loadFromServiceLoader[JdbcConnectionProvider]()
+ .map { provider =>
info(s"Loaded provider: $provider")
- providers += provider
- } catch {
- case t: Throwable =>
- warn(s"Loaded of the provider failed with the exception", t)
- }
- }
-
- // TODO support disable provider
- providers
- }
+ provider
+ }.toSeq
}
object ConnectionProvider extends AbstractConnectionProvider
diff --git a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/dialect/JdbcDialect.scala b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/dialect/JdbcDialect.scala
index b7ac7f43b0f..e08b2275875 100644
--- a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/dialect/JdbcDialect.scala
+++ b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/dialect/JdbcDialect.scala
@@ -18,9 +18,6 @@ package org.apache.kyuubi.engine.jdbc.dialect
import java.sql.{Connection, Statement}
import java.util
-import java.util.ServiceLoader
-
-import scala.collection.JavaConverters._
import org.apache.kyuubi.{KyuubiException, Logging}
import org.apache.kyuubi.config.KyuubiConf
@@ -29,6 +26,7 @@ import org.apache.kyuubi.engine.jdbc.schema.{RowSetHelper, SchemaHelper}
import org.apache.kyuubi.engine.jdbc.util.SupportServiceLoader
import org.apache.kyuubi.operation.Operation
import org.apache.kyuubi.session.Session
+import org.apache.kyuubi.util.reflect.ReflectUtils._
abstract class JdbcDialect extends SupportServiceLoader with Logging {
@@ -75,9 +73,8 @@ object JdbcDialects extends Logging {
assert(url.length > 5 && url.substring(5).contains(":"))
url.substring(5, url.indexOf(":", 5))
}
- val serviceLoader =
- ServiceLoader.load(classOf[JdbcDialect], Thread.currentThread().getContextClassLoader)
- serviceLoader.asScala.filter(_.name().equalsIgnoreCase(shortName)).toList match {
+ loadFromServiceLoader[JdbcDialect]()
+ .filter(_.name().equalsIgnoreCase(shortName)).toList match {
case Nil =>
throw new KyuubiException(s"Don't find jdbc dialect implement for jdbc engine: $shortName.")
case head :: Nil =>
diff --git a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/operation/JdbcOperation.scala b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/operation/JdbcOperation.scala
index 6cac42f49ef..2ca17375717 100644
--- a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/operation/JdbcOperation.scala
+++ b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/operation/JdbcOperation.scala
@@ -16,7 +16,7 @@
*/
package org.apache.kyuubi.engine.jdbc.operation
-import org.apache.hive.service.rpc.thrift.{TGetResultSetMetadataResp, TRowSet}
+import org.apache.hive.service.rpc.thrift.{TFetchResultsResp, TGetResultSetMetadataResp, TRowSet}
import org.apache.kyuubi.{KyuubiSQLException, Utils}
import org.apache.kyuubi.config.KyuubiConf
@@ -36,7 +36,9 @@ abstract class JdbcOperation(session: Session) extends AbstractOperation(session
protected lazy val dialect: JdbcDialect = JdbcDialects.get(conf)
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
setHasResultSet(true)
@@ -51,7 +53,10 @@ abstract class JdbcOperation(session: Session) extends AbstractOperation(session
val taken = iter.take(rowSetSize)
val resultRowSet = toTRowSet(taken)
resultRowSet.setStartRowOffset(iter.getPosition)
- resultRowSet
+ val resp = new TFetchResultsResp(OK_STATUS)
+ resp.setResults(resultRowSet)
+ resp.setHasMoreRows(false)
+ resp
}
override def cancel(): Unit = {
@@ -66,7 +71,7 @@ abstract class JdbcOperation(session: Session) extends AbstractOperation(session
// We should use Throwable instead of Exception since `java.lang.NoClassDefFoundError`
// could be thrown.
case e: Throwable =>
- state.synchronized {
+ withLockRequired {
val errMsg = Utils.stringifyException(e)
if (state == OperationState.TIMEOUT) {
val ke = KyuubiSQLException(s"Timeout operating $opType: $errMsg")
diff --git a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/session/JdbcSessionImpl.scala b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/session/JdbcSessionImpl.scala
index f8cd40412f0..8b36e5a56df 100644
--- a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/session/JdbcSessionImpl.scala
+++ b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/session/JdbcSessionImpl.scala
@@ -23,8 +23,11 @@ import scala.util.{Failure, Success, Try}
import org.apache.hive.service.rpc.thrift.{TGetInfoType, TGetInfoValue, TProtocolVersion}
import org.apache.kyuubi.KyuubiSQLException
+import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.KyuubiConf._
import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_HANDLE_KEY
import org.apache.kyuubi.engine.jdbc.connection.ConnectionProvider
+import org.apache.kyuubi.engine.jdbc.util.KyuubiJdbcUtils
import org.apache.kyuubi.session.{AbstractSession, SessionHandle, SessionManager}
class JdbcSessionImpl(
@@ -43,7 +46,16 @@ class JdbcSessionImpl(
private var databaseMetaData: DatabaseMetaData = _
- private val kyuubiConf = sessionManager.getConf
+ private val kyuubiConf: KyuubiConf = normalizeConf
+
+ private def normalizeConf: KyuubiConf = {
+ val kyuubiConf = sessionManager.getConf.clone
+ if (kyuubiConf.get(ENGINE_JDBC_CONNECTION_PROPAGATECREDENTIAL)) {
+ kyuubiConf.set(ENGINE_JDBC_CONNECTION_USER, user)
+ kyuubiConf.set(ENGINE_JDBC_CONNECTION_PASSWORD, password)
+ }
+ kyuubiConf
+ }
override def open(): Unit = {
info(s"Starting to open jdbc session.")
@@ -51,6 +63,10 @@ class JdbcSessionImpl(
sessionConnection = ConnectionProvider.create(kyuubiConf)
databaseMetaData = sessionConnection.getMetaData
}
+ KyuubiJdbcUtils.initializeJdbcSession(
+ kyuubiConf,
+ sessionConnection,
+ kyuubiConf.get(ENGINE_JDBC_SESSION_INITIALIZE_SQL))
super.open()
info(s"The jdbc session is started.")
}
diff --git a/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/util/KyuubiJdbcUtils.scala b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/util/KyuubiJdbcUtils.scala
new file mode 100644
index 00000000000..7107045ff14
--- /dev/null
+++ b/externals/kyuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/util/KyuubiJdbcUtils.scala
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.jdbc.util
+
+import java.sql.Connection
+
+import org.apache.kyuubi.{KyuubiSQLException, Logging}
+import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.engine.jdbc.connection.ConnectionProvider
+import org.apache.kyuubi.engine.jdbc.dialect.{JdbcDialect, JdbcDialects}
+import org.apache.kyuubi.util.JdbcUtils
+
+object KyuubiJdbcUtils extends Logging {
+
+ def initializeJdbcSession(kyuubiConf: KyuubiConf, initializationSQLs: Seq[String]): Unit = {
+ JdbcUtils.withCloseable(ConnectionProvider.create(kyuubiConf)) { connection =>
+ initializeJdbcSession(kyuubiConf, connection, initializationSQLs)
+ }
+ }
+
+ def initializeJdbcSession(
+ kyuubiConf: KyuubiConf,
+ connection: Connection,
+ initializationSQLs: Seq[String]): Unit = {
+ if (initializationSQLs == null || initializationSQLs.isEmpty) {
+ return
+ }
+ try {
+ val dialect: JdbcDialect = JdbcDialects.get(kyuubiConf)
+ JdbcUtils.withCloseable(dialect.createStatement(connection)) { statement =>
+ initializationSQLs.foreach { sql =>
+ debug(s"Execute initialization sql: $sql")
+ statement.execute(sql)
+ }
+ }
+ } catch {
+ case e: Exception =>
+ error("Failed to execute initialization sql.", e)
+ throw KyuubiSQLException(e)
+ }
+ }
+}
diff --git a/externals/kyuubi-spark-sql-engine/pom.xml b/externals/kyuubi-spark-sql-engine/pom.xml
index 5b227cb5e29..c453bd28382 100644
--- a/externals/kyuubi-spark-sql-engine/pom.xml
+++ b/externals/kyuubi-spark-sql-engine/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../pom.xml
- kyuubi-spark-sql-engine_2.12
+ kyuubi-spark-sql-engine_${scala.binary.version}jarKyuubi Project Engine Spark SQLhttps://kyuubi.apache.org/
@@ -65,6 +65,13 @@
provided
+
+ org.apache.spark
+ spark-sql_${scala.binary.version}
+ test-jar
+ test
+
+
org.apache.sparkspark-repl_${scala.binary.version}
@@ -140,69 +147,77 @@
- org.apache.parquet
- parquet-avro
- test
-
-
-
- org.apache.spark
- spark-avro_${scala.binary.version}
- test
-
-
-
- org.apache.hudi
- hudi-common
+ io.delta
+ delta-core_${scala.binary.version}test
- org.apache.hudi
- hudi-spark-common_${scala.binary.version}
+ org.apache.kyuubi
+ kyuubi-zookeeper_${scala.binary.version}
+ ${project.version}test
- org.apache.hudi
- hudi-spark_${scala.binary.version}
+ com.dimafeng
+ testcontainers-scala-scalatest_${scala.binary.version}test
- org.apache.hudi
- hudi-spark3.1.x_${scala.binary.version}
+ io.etcd
+ jetcd-launchertest
- io.delta
- delta-core_${scala.binary.version}
+ com.vladsch.flexmark
+ flexmark-alltestorg.apache.kyuubi
- kyuubi-zookeeper_${scala.binary.version}
+ kyuubi-spark-lineage_${scala.binary.version}${project.version}test
-
-
- io.etcd
- jetcd-launcher
- test
-
-
-
- com.vladsch.flexmark
- flexmark-all
- test
-
+
+
+ org.codehaus.mojo
+ build-helper-maven-plugin
+
+
+ add-scala-sources
+
+ add-source
+
+ generate-sources
+
+
+
+
+
+
+
+ add-scala-test-sources
+
+ add-test-source
+
+ generate-test-sources
+
+
+
+
+
+
+
+ org.apache.maven.pluginsmaven-shade-plugin
@@ -223,15 +238,9 @@
io.perfmark:perfmark-apiio.vertx:*net.jodah:failsafe
- org.apache.curator:curator-client
- org.apache.curator:curator-framework
- org.apache.curator:curator-recipesorg.apache.hive:hive-service-rpc
- org.apache.kyuubi:kyuubi-common_${scala.binary.version}
- org.apache.kyuubi:kyuubi-events_${scala.binary.version}
- org.apache.kyuubi:kyuubi-ha_${scala.binary.version}
+ org.apache.kyuubi:*org.apache.thrift:*
- org.apache.zookeeper:zookeeperorg.checkerframework:checker-qualorg.codehaus.mojo:animal-sniffer-annotations
@@ -256,27 +265,6 @@
-
- org.apache.curator
- ${kyuubi.shade.packageName}.org.apache.curator
-
- org.apache.curator.**
-
-
-
- org.apache.zookeeper
- ${kyuubi.shade.packageName}.org.apache.zookeeper
-
- org.apache.zookeeper.**
-
-
-
- org.apache.jute
- ${kyuubi.shade.packageName}.org.apache.jute
-
- org.apache.jute.**
-
- org.apache.hive.service.rpc.thrift${kyuubi.shade.packageName}.org.apache.hive.service.rpc.thrift
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala b/externals/kyuubi-spark-sql-engine/src/main/scala-2.12/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala
similarity index 90%
rename from externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala
rename to externals/kyuubi-spark-sql-engine/src/main/scala-2.12/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala
index 27090fae4af..fbbda89edbd 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala-2.12/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala
@@ -17,22 +17,23 @@
package org.apache.kyuubi.engine.spark.repl
-import java.io.{ByteArrayOutputStream, File}
+import java.io.{ByteArrayOutputStream, File, PrintWriter}
import java.util.concurrent.locks.ReentrantLock
import scala.tools.nsc.Settings
-import scala.tools.nsc.interpreter.IR
-import scala.tools.nsc.interpreter.JPrintWriter
+import scala.tools.nsc.interpreter.Results
import org.apache.spark.SparkContext
import org.apache.spark.repl.SparkILoop
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.util.MutableURLClassLoader
+import org.apache.kyuubi.Utils
+
private[spark] case class KyuubiSparkILoop private (
spark: SparkSession,
output: ByteArrayOutputStream)
- extends SparkILoop(None, new JPrintWriter(output)) {
+ extends SparkILoop(None, new PrintWriter(output)) {
import KyuubiSparkILoop._
val result = new DataFrameHolder(spark)
@@ -100,7 +101,7 @@ private[spark] case class KyuubiSparkILoop private (
def clearResult(statementId: String): Unit = result.unset(statementId)
- def interpretWithRedirectOutError(statement: String): IR.Result = withLockRequired {
+ def interpretWithRedirectOutError(statement: String): Results.Result = withLockRequired {
Console.withOut(output) {
Console.withErr(output) {
this.interpret(statement)
@@ -124,10 +125,5 @@ private[spark] object KyuubiSparkILoop {
}
private val lock = new ReentrantLock()
- private def withLockRequired[T](block: => T): T = {
- try {
- lock.lock()
- block
- } finally lock.unlock()
- }
+ private def withLockRequired[T](block: => T): T = Utils.withLockRequired(lock)(block)
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala-2.13/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala b/externals/kyuubi-spark-sql-engine/src/main/scala-2.13/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala
new file mode 100644
index 00000000000..a63d71a7885
--- /dev/null
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala-2.13/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.spark.repl
+
+import java.io.{ByteArrayOutputStream, File, PrintWriter}
+import java.util.concurrent.locks.ReentrantLock
+
+import scala.tools.nsc.Settings
+import scala.tools.nsc.interpreter.{IMain, Results}
+
+import org.apache.spark.SparkContext
+import org.apache.spark.repl.SparkILoop
+import org.apache.spark.sql.{DataFrame, SparkSession}
+import org.apache.spark.util.MutableURLClassLoader
+
+import org.apache.kyuubi.Utils
+
+private[spark] case class KyuubiSparkILoop private (
+ spark: SparkSession,
+ output: ByteArrayOutputStream)
+ extends SparkILoop(null, new PrintWriter(output)) {
+ import KyuubiSparkILoop._
+
+ val result = new DataFrameHolder(spark)
+
+ private def initialize(): Unit = withLockRequired {
+ val settings = new Settings
+ val interpArguments = List(
+ "-Yrepl-class-based",
+ "-Yrepl-outdir",
+ s"${spark.sparkContext.getConf.get("spark.repl.class.outputDir")}")
+ settings.processArguments(interpArguments, processAll = true)
+ settings.usejavacp.value = true
+ val currentClassLoader = Thread.currentThread().getContextClassLoader
+ settings.embeddedDefaults(currentClassLoader)
+ this.createInterpreter(settings)
+ val iMain = this.intp.asInstanceOf[IMain]
+ iMain.initializeCompiler()
+ try {
+ this.compilerClasspath
+ iMain.ensureClassLoader()
+ var classLoader: ClassLoader = Thread.currentThread().getContextClassLoader
+ while (classLoader != null) {
+ classLoader match {
+ case loader: MutableURLClassLoader =>
+ val allJars = loader.getURLs.filter { u =>
+ val file = new File(u.getPath)
+ u.getProtocol == "file" && file.isFile &&
+ file.getName.contains("scala-lang_scala-reflect")
+ }
+ this.addUrlsToClassPath(allJars: _*)
+ classLoader = null
+ case _ =>
+ classLoader = classLoader.getParent
+ }
+ }
+
+ this.addUrlsToClassPath(
+ classOf[DataFrameHolder].getProtectionDomain.getCodeSource.getLocation)
+ } finally {
+ Thread.currentThread().setContextClassLoader(currentClassLoader)
+ }
+
+ this.beQuietDuring {
+ // SparkSession/SparkContext and their implicits
+ this.bind("spark", classOf[SparkSession].getCanonicalName, spark, List("""@transient"""))
+ this.bind(
+ "sc",
+ classOf[SparkContext].getCanonicalName,
+ spark.sparkContext,
+ List("""@transient"""))
+
+ this.interpret("import org.apache.spark.SparkContext._")
+ this.interpret("import spark.implicits._")
+ this.interpret("import spark.sql")
+ this.interpret("import org.apache.spark.sql.functions._")
+
+ // for feeding results to client, e.g. beeline
+ this.bind(
+ "result",
+ classOf[DataFrameHolder].getCanonicalName,
+ result)
+ }
+ }
+
+ def getResult(statementId: String): DataFrame = result.get(statementId)
+
+ def clearResult(statementId: String): Unit = result.unset(statementId)
+
+ def interpretWithRedirectOutError(statement: String): Results.Result = withLockRequired {
+ Console.withOut(output) {
+ Console.withErr(output) {
+ this.interpret(statement)
+ }
+ }
+ }
+
+ def getOutput: String = {
+ val res = output.toString.trim
+ output.reset()
+ res
+ }
+}
+
+private[spark] object KyuubiSparkILoop {
+ def apply(spark: SparkSession): KyuubiSparkILoop = {
+ val os = new ByteArrayOutputStream()
+ val iLoop = new KyuubiSparkILoop(spark, os)
+ iLoop.initialize()
+ iLoop
+ }
+
+ private val lock = new ReentrantLock()
+ private def withLockRequired[T](block: => T): T = Utils.withLockRequired(lock)(block)
+}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/KyuubiSparkUtil.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/KyuubiSparkUtil.scala
index 2c3e7195c43..b9fb9325999 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/KyuubiSparkUtil.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/KyuubiSparkUtil.scala
@@ -21,12 +21,12 @@ import java.time.{Instant, LocalDateTime, ZoneId}
import scala.annotation.meta.getter
-import org.apache.spark.SparkContext
+import org.apache.spark.{SPARK_VERSION, SparkContext}
import org.apache.spark.sql.SparkSession
import org.apache.spark.util.kvstore.KVIndex
import org.apache.kyuubi.Logging
-import org.apache.kyuubi.engine.SemanticVersion
+import org.apache.kyuubi.util.SemanticVersion
object KyuubiSparkUtil extends Logging {
@@ -95,9 +95,7 @@ object KyuubiSparkUtil extends Logging {
}
}
- lazy val sparkMajorMinorVersion: (Int, Int) = {
- val runtimeSparkVer = org.apache.spark.SPARK_VERSION
- val runtimeVersion = SemanticVersion(runtimeSparkVer)
- (runtimeVersion.majorVersion, runtimeVersion.minorVersion)
- }
+ // Given that we are on the Spark SQL engine side, the [[org.apache.spark.SPARK_VERSION]] can be
+ // represented as the runtime version of the Spark SQL engine.
+ lazy val SPARK_ENGINE_RUNTIME_VERSION: SemanticVersion = SemanticVersion(SPARK_VERSION)
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkSQLEngine.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkSQLEngine.scala
index 42e7c44a137..5f91bc73db5 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkSQLEngine.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkSQLEngine.scala
@@ -17,7 +17,6 @@
package org.apache.kyuubi.engine.spark
-import java.net.InetAddress
import java.time.Instant
import java.util.{Locale, UUID}
import java.util.concurrent.{CountDownLatch, ScheduledExecutorService, ThreadPoolExecutor, TimeUnit}
@@ -36,7 +35,8 @@ import org.apache.kyuubi.{KyuubiException, Logging, Utils}
import org.apache.kyuubi.Utils._
import org.apache.kyuubi.config.{KyuubiConf, KyuubiReservedKeys}
import org.apache.kyuubi.config.KyuubiConf._
-import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_ENGINE_SUBMIT_TIME_KEY
+import org.apache.kyuubi.config.KyuubiReservedKeys.{KYUUBI_ENGINE_SUBMIT_TIME_KEY, KYUUBI_ENGINE_URL}
+import org.apache.kyuubi.engine.ShareLevel
import org.apache.kyuubi.engine.spark.SparkSQLEngine.{countDownLatch, currentEngine}
import org.apache.kyuubi.engine.spark.events.{EngineEvent, EngineEventsStore, SparkEventHandlerRegister}
import org.apache.kyuubi.engine.spark.session.SparkSessionImpl
@@ -80,6 +80,12 @@ case class SparkSQLEngine(spark: SparkSession) extends Serverable("SparkSQLEngin
assert(currentEngine.isDefined)
currentEngine.get.stop()
})
+
+ val maxInitTimeout = conf.get(ENGINE_SPARK_MAX_INITIAL_WAIT)
+ if (conf.get(ENGINE_SHARE_LEVEL) == ShareLevel.CONNECTION.toString &&
+ maxInitTimeout > 0) {
+ startFastFailChecker(maxInitTimeout)
+ }
}
override def stop(): Unit = if (shutdown.compareAndSet(false, true)) {
@@ -114,6 +120,27 @@ case class SparkSQLEngine(spark: SparkSession) extends Serverable("SparkSQLEngin
stopEngineExec.get.execute(stopTask)
}
+ private[kyuubi] def startFastFailChecker(maxTimeout: Long): Unit = {
+ val startedTime = System.currentTimeMillis()
+ Utils.tryLogNonFatalError {
+ ThreadUtils.runInNewThread("spark-engine-failfast-checker") {
+ if (!shutdown.get) {
+ while (backendService.sessionManager.getOpenSessionCount <= 0 &&
+ System.currentTimeMillis() - startedTime < maxTimeout) {
+ info(s"Waiting for the initial connection")
+ Thread.sleep(Duration(10, TimeUnit.SECONDS).toMillis)
+ }
+ if (backendService.sessionManager.getOpenSessionCount <= 0) {
+ error(s"Spark engine has been terminated because no incoming connection" +
+ s" for more than $maxTimeout ms, de-registering from engine discovery space.")
+ assert(currentEngine.isDefined)
+ currentEngine.get.stop()
+ }
+ }
+ }
+ }
+ }
+
override protected def stopServer(): Unit = {
countDownLatch.countDown()
}
@@ -165,6 +192,10 @@ object SparkSQLEngine extends Logging {
private val sparkSessionCreated = new AtomicBoolean(false)
+ // Kubernetes pod name max length - '-exec-' - Int.MAX_VALUE.length
+ // 253 - 10 - 6
+ val EXECUTOR_POD_NAME_PREFIX_MAX_LENGTH = 237
+
SignalRegister.registerLogger(logger)
setupConf()
@@ -189,7 +220,6 @@ object SparkSQLEngine extends Logging {
_kyuubiConf = KyuubiConf()
val rootDir = _sparkConf.getOption("spark.repl.classdir").getOrElse(getLocalDir(_sparkConf))
val outputDir = Utils.createTempDir(prefix = "repl", root = rootDir)
- _sparkConf.setIfMissing("spark.sql.execution.topKSortFallbackThreshold", "10000")
_sparkConf.setIfMissing("spark.sql.legacy.castComplexTypesToString.enabled", "true")
_sparkConf.setIfMissing("spark.master", "local")
_sparkConf.set(
@@ -223,7 +253,7 @@ object SparkSQLEngine extends Logging {
if (!isOnK8sClusterMode) {
// set driver host to ip instead of kyuubi pod name
- _sparkConf.set("spark.driver.host", InetAddress.getLocalHost.getHostAddress)
+ _sparkConf.setIfMissing("spark.driver.host", Utils.findLocalInetAddress.getHostAddress)
}
}
@@ -259,6 +289,7 @@ object SparkSQLEngine extends Logging {
KyuubiSparkUtil.initializeSparkSession(
session,
kyuubiConf.get(ENGINE_INITIALIZE_SQL) ++ kyuubiConf.get(ENGINE_SESSION_INITIALIZE_SQL))
+ session.sparkContext.setLocalProperty(KYUUBI_ENGINE_URL, KyuubiSparkUtil.engineUrl)
session
}
@@ -359,7 +390,7 @@ object SparkSQLEngine extends Logging {
private def startInitTimeoutChecker(startTime: Long, timeout: Long): Unit = {
val mainThread = Thread.currentThread()
- new Thread(
+ val checker = new Thread(
() => {
while (System.currentTimeMillis() - startTime < timeout && !sparkSessionCreated.get()) {
Thread.sleep(500)
@@ -368,7 +399,9 @@ object SparkSQLEngine extends Logging {
mainThread.interrupt()
}
},
- "CreateSparkTimeoutChecker").start()
+ "CreateSparkTimeoutChecker")
+ checker.setDaemon(true)
+ checker.start()
}
private def isOnK8sClusterMode: Boolean = {
@@ -390,8 +423,4 @@ object SparkSQLEngine extends Logging {
s"kyuubi-${UUID.randomUUID()}"
}
}
-
- // Kubernetes pod name max length - '-exec-' - Int.MAX_VALUE.length
- // 253 - 10 - 6
- val EXECUTOR_POD_NAME_PREFIX_MAX_LENGTH = 237
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkTBinaryFrontendService.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkTBinaryFrontendService.scala
index 854a28e85a1..c2563b32bce 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkTBinaryFrontendService.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/SparkTBinaryFrontendService.scala
@@ -19,6 +19,7 @@ package org.apache.kyuubi.engine.spark
import scala.collection.JavaConverters._
+import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.io.Text
import org.apache.hadoop.security.{Credentials, UserGroupInformation}
import org.apache.hadoop.security.token.{Token, TokenIdentifier}
@@ -33,6 +34,7 @@ import org.apache.kyuubi.ha.client.{EngineServiceDiscovery, ServiceDiscovery}
import org.apache.kyuubi.service.{Serverable, Service, TBinaryFrontendService}
import org.apache.kyuubi.service.TFrontendService._
import org.apache.kyuubi.util.KyuubiHadoopUtils
+import org.apache.kyuubi.util.reflect.DynConstructors
class SparkTBinaryFrontendService(
override val serverable: Serverable)
@@ -110,6 +112,8 @@ class SparkTBinaryFrontendService(
object SparkTBinaryFrontendService extends Logging {
val HIVE_DELEGATION_TOKEN = new Text("HIVE_DELEGATION_TOKEN")
+ val HIVE_CONF_CLASSNAME = "org.apache.hadoop.hive.conf.HiveConf"
+ @volatile private var _hiveConf: Configuration = _
private[spark] def renewDelegationToken(sc: SparkContext, delegationToken: String): Unit = {
val newCreds = KyuubiHadoopUtils.decodeCredentials(delegationToken)
@@ -133,7 +137,7 @@ object SparkTBinaryFrontendService extends Logging {
newTokens: Map[Text, Token[_ <: TokenIdentifier]],
oldCreds: Credentials,
updateCreds: Credentials): Unit = {
- val metastoreUris = sc.hadoopConfiguration.getTrimmed("hive.metastore.uris", "")
+ val metastoreUris = hiveConf(sc.hadoopConfiguration).getTrimmed("hive.metastore.uris", "")
// `HiveMetaStoreClient` selects the first token whose service is "" and kind is
// "HIVE_DELEGATION_TOKEN" to authenticate.
@@ -204,4 +208,25 @@ object SparkTBinaryFrontendService extends Logging {
1
}
}
+
+ private[kyuubi] def hiveConf(hadoopConf: Configuration): Configuration = {
+ if (_hiveConf == null) {
+ synchronized {
+ if (_hiveConf == null) {
+ _hiveConf =
+ try {
+ DynConstructors.builder()
+ .impl(HIVE_CONF_CLASSNAME, classOf[Configuration], classOf[Class[_]])
+ .build[Configuration]()
+ .newInstance(hadoopConf, Class.forName(HIVE_CONF_CLASSNAME))
+ } catch {
+ case e: Throwable =>
+ warn("Fail to create Hive Configuration", e)
+ hadoopConf
+ }
+ }
+ }
+ }
+ _hiveConf
+ }
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecutePython.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecutePython.scala
index d2627fd99fd..badd835301a 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecutePython.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecutePython.scala
@@ -40,7 +40,7 @@ import org.apache.kyuubi.{KyuubiSQLException, Logging, Utils}
import org.apache.kyuubi.config.KyuubiConf.{ENGINE_SPARK_PYTHON_ENV_ARCHIVE, ENGINE_SPARK_PYTHON_ENV_ARCHIVE_EXEC_PATH, ENGINE_SPARK_PYTHON_HOME_ARCHIVE}
import org.apache.kyuubi.config.KyuubiReservedKeys.{KYUUBI_SESSION_USER_KEY, KYUUBI_STATEMENT_ID_KEY}
import org.apache.kyuubi.engine.spark.KyuubiSparkUtil._
-import org.apache.kyuubi.operation.{ArrayFetchIterator, OperationState}
+import org.apache.kyuubi.operation.{ArrayFetchIterator, OperationHandle, OperationState}
import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
@@ -49,7 +49,8 @@ class ExecutePython(
override val statement: String,
override val shouldRunAsync: Boolean,
queryTimeout: Long,
- worker: SessionPythonWorker) extends SparkOperation(session) {
+ worker: SessionPythonWorker,
+ override protected val handle: OperationHandle) extends SparkOperation(session) {
private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
override def getOperationLog: Option[OperationLog] = Option(operationLog)
@@ -77,30 +78,31 @@ class ExecutePython(
OperationLog.removeCurrentOperationLog()
}
- private def executePython(): Unit = withLocalProperties {
+ private def executePython(): Unit =
try {
- setState(OperationState.RUNNING)
- info(diagnostics)
- addOperationListener()
- val response = worker.runCode(statement)
- val status = response.map(_.content.status).getOrElse("UNKNOWN_STATUS")
- if (PythonResponse.OK_STATUS.equalsIgnoreCase(status)) {
- val output = response.map(_.content.getOutput()).getOrElse("")
- val ename = response.map(_.content.getEname()).getOrElse("")
- val evalue = response.map(_.content.getEvalue()).getOrElse("")
- val traceback = response.map(_.content.getTraceback()).getOrElse(Seq.empty)
- iter =
- new ArrayFetchIterator[Row](Array(Row(output, status, ename, evalue, traceback)))
- setState(OperationState.FINISHED)
- } else {
- throw KyuubiSQLException(s"Interpret error:\n$statement\n $response")
+ withLocalProperties {
+ setState(OperationState.RUNNING)
+ info(diagnostics)
+ addOperationListener()
+ val response = worker.runCode(statement)
+ val status = response.map(_.content.status).getOrElse("UNKNOWN_STATUS")
+ if (PythonResponse.OK_STATUS.equalsIgnoreCase(status)) {
+ val output = response.map(_.content.getOutput()).getOrElse("")
+ val ename = response.map(_.content.getEname()).getOrElse("")
+ val evalue = response.map(_.content.getEvalue()).getOrElse("")
+ val traceback = response.map(_.content.getTraceback()).getOrElse(Seq.empty)
+ iter =
+ new ArrayFetchIterator[Row](Array(Row(output, status, ename, evalue, traceback)))
+ setState(OperationState.FINISHED)
+ } else {
+ throw KyuubiSQLException(s"Interpret error:\n$statement\n $response")
+ }
}
} catch {
onError(cancel = true)
} finally {
shutdownTimeoutMonitor()
}
- }
override protected def runInternal(): Unit = {
addTimeoutMonitor(queryTimeout)
@@ -180,12 +182,7 @@ case class SessionPythonWorker(
new BufferedReader(new InputStreamReader(workerProcess.getInputStream), 1)
private val lock = new ReentrantLock()
- private def withLockRequired[T](block: => T): T = {
- try {
- lock.lock()
- block
- } finally lock.unlock()
- }
+ private def withLockRequired[T](block: => T): T = Utils.withLockRequired(lock)(block)
/**
* Run the python code and return the response. This method maybe invoked internally,
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecuteScala.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecuteScala.scala
index ff686cca0d0..691c4fb32d3 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecuteScala.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecuteScala.scala
@@ -31,7 +31,7 @@ import org.apache.spark.sql.types.StructType
import org.apache.kyuubi.KyuubiSQLException
import org.apache.kyuubi.engine.spark.KyuubiSparkUtil._
import org.apache.kyuubi.engine.spark.repl.KyuubiSparkILoop
-import org.apache.kyuubi.operation.{ArrayFetchIterator, OperationState}
+import org.apache.kyuubi.operation.{ArrayFetchIterator, OperationHandle, OperationState}
import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
@@ -51,7 +51,8 @@ class ExecuteScala(
repl: KyuubiSparkILoop,
override val statement: String,
override val shouldRunAsync: Boolean,
- queryTimeout: Long)
+ queryTimeout: Long,
+ override protected val handle: OperationHandle)
extends SparkOperation(session) {
private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
@@ -76,59 +77,60 @@ class ExecuteScala(
OperationLog.removeCurrentOperationLog()
}
- private def executeScala(): Unit = withLocalProperties {
+ private def executeScala(): Unit =
try {
- setState(OperationState.RUNNING)
- info(diagnostics)
- Thread.currentThread().setContextClassLoader(spark.sharedState.jarClassLoader)
- addOperationListener()
- val legacyOutput = repl.getOutput
- if (legacyOutput.nonEmpty) {
- warn(s"Clearing legacy output from last interpreting:\n $legacyOutput")
- }
- val replUrls = repl.classLoader.getParent.asInstanceOf[URLClassLoader].getURLs
- spark.sharedState.jarClassLoader.getURLs.filterNot(replUrls.contains).foreach { jar =>
- try {
- if ("file".equals(jar.toURI.getScheme)) {
- repl.addUrlsToClassPath(jar)
- } else {
- spark.sparkContext.addFile(jar.toString)
- val localJarFile = new File(SparkFiles.get(new Path(jar.toURI.getPath).getName))
- val localJarUrl = localJarFile.toURI.toURL
- if (!replUrls.contains(localJarUrl)) {
- repl.addUrlsToClassPath(localJarUrl)
+ withLocalProperties {
+ setState(OperationState.RUNNING)
+ info(diagnostics)
+ Thread.currentThread().setContextClassLoader(spark.sharedState.jarClassLoader)
+ addOperationListener()
+ val legacyOutput = repl.getOutput
+ if (legacyOutput.nonEmpty) {
+ warn(s"Clearing legacy output from last interpreting:\n $legacyOutput")
+ }
+ val replUrls = repl.classLoader.getParent.asInstanceOf[URLClassLoader].getURLs
+ spark.sharedState.jarClassLoader.getURLs.filterNot(replUrls.contains).foreach { jar =>
+ try {
+ if ("file".equals(jar.toURI.getScheme)) {
+ repl.addUrlsToClassPath(jar)
+ } else {
+ spark.sparkContext.addFile(jar.toString)
+ val localJarFile = new File(SparkFiles.get(new Path(jar.toURI.getPath).getName))
+ val localJarUrl = localJarFile.toURI.toURL
+ if (!replUrls.contains(localJarUrl)) {
+ repl.addUrlsToClassPath(localJarUrl)
+ }
}
+ } catch {
+ case e: Throwable => error(s"Error adding $jar to repl class path", e)
}
- } catch {
- case e: Throwable => error(s"Error adding $jar to repl class path", e)
}
- }
- repl.interpretWithRedirectOutError(statement) match {
- case Success =>
- iter = {
- result = repl.getResult(statementId)
- if (result != null) {
- new ArrayFetchIterator[Row](result.collect())
- } else {
- val output = repl.getOutput
- debug("scala repl output:\n" + output)
- new ArrayFetchIterator[Row](Array(Row(output)))
+ repl.interpretWithRedirectOutError(statement) match {
+ case Success =>
+ iter = {
+ result = repl.getResult(statementId)
+ if (result != null) {
+ new ArrayFetchIterator[Row](result.collect())
+ } else {
+ val output = repl.getOutput
+ debug("scala repl output:\n" + output)
+ new ArrayFetchIterator[Row](Array(Row(output)))
+ }
}
- }
- case Error =>
- throw KyuubiSQLException(s"Interpret error:\n$statement\n ${repl.getOutput}")
- case Incomplete =>
- throw KyuubiSQLException(s"Incomplete code:\n$statement")
+ case Error =>
+ throw KyuubiSQLException(s"Interpret error:\n$statement\n ${repl.getOutput}")
+ case Incomplete =>
+ throw KyuubiSQLException(s"Incomplete code:\n$statement")
+ }
+ setState(OperationState.FINISHED)
}
- setState(OperationState.FINISHED)
} catch {
onError(cancel = true)
} finally {
repl.clearResult(statementId)
shutdownTimeoutMonitor()
}
- }
override protected def runInternal(): Unit = {
addTimeoutMonitor(queryTimeout)
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecuteStatement.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecuteStatement.scala
index b29d2ca9a7e..17d8a741269 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecuteStatement.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/ExecuteStatement.scala
@@ -21,15 +21,14 @@ import java.util.concurrent.RejectedExecutionException
import scala.collection.JavaConverters._
-import org.apache.spark.rdd.RDD
import org.apache.spark.sql.DataFrame
-import org.apache.spark.sql.execution.SQLExecution
-import org.apache.spark.sql.kyuubi.SparkDatasetHelper
+import org.apache.spark.sql.kyuubi.SparkDatasetHelper._
import org.apache.spark.sql.types._
import org.apache.kyuubi.{KyuubiSQLException, Logging}
import org.apache.kyuubi.config.KyuubiConf.OPERATION_RESULT_MAX_ROWS
import org.apache.kyuubi.engine.spark.KyuubiSparkUtil._
+import org.apache.kyuubi.engine.spark.session.SparkSessionImpl
import org.apache.kyuubi.operation.{ArrayFetchIterator, FetchIterator, IterableFetchIterator, OperationHandle, OperationState}
import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
@@ -77,22 +76,23 @@ class ExecuteStatement(
resultDF.take(maxRows)
}
- protected def executeStatement(): Unit = withLocalProperties {
+ protected def executeStatement(): Unit =
try {
- setState(OperationState.RUNNING)
- info(diagnostics)
- Thread.currentThread().setContextClassLoader(spark.sharedState.jarClassLoader)
- addOperationListener()
- result = spark.sql(statement)
- iter = collectAsIterator(result)
- setCompiledStateIfNeeded()
- setState(OperationState.FINISHED)
+ withLocalProperties {
+ setState(OperationState.RUNNING)
+ info(diagnostics)
+ Thread.currentThread().setContextClassLoader(spark.sharedState.jarClassLoader)
+ addOperationListener()
+ result = spark.sql(statement)
+ iter = collectAsIterator(result)
+ setCompiledStateIfNeeded()
+ setState(OperationState.FINISHED)
+ }
} catch {
onError(cancel = true)
} finally {
shutdownTimeoutMonitor()
}
- }
override protected def runInternal(): Unit = {
addTimeoutMonitor(queryTimeout)
@@ -186,35 +186,18 @@ class ArrowBasedExecuteStatement(
incrementalCollect,
handle) {
+ checkUseLargeVarType()
+
override protected def incrementalCollectResult(resultDF: DataFrame): Iterator[Any] = {
- collectAsArrow(convertComplexType(resultDF)) { rdd =>
- rdd.toLocalIterator
- }
+ toArrowBatchLocalIterator(convertComplexType(resultDF))
}
override protected def fullCollectResult(resultDF: DataFrame): Array[_] = {
- collectAsArrow(convertComplexType(resultDF)) { rdd =>
- rdd.collect()
- }
+ executeCollect(convertComplexType(resultDF))
}
override protected def takeResult(resultDF: DataFrame, maxRows: Int): Array[_] = {
- // this will introduce shuffle and hurt performance
- val limitedResult = resultDF.limit(maxRows)
- collectAsArrow(convertComplexType(limitedResult)) { rdd =>
- rdd.collect()
- }
- }
-
- /**
- * refer to org.apache.spark.sql.Dataset#withAction(), assign a new execution id for arrow-based
- * operation, so that we can track the arrow-based queries on the UI tab.
- */
- private def collectAsArrow[T](df: DataFrame)(action: RDD[Array[Byte]] => T): T = {
- SQLExecution.withNewExecutionId(df.queryExecution, Some("collectAsArrow")) {
- df.queryExecution.executedPlan.resetMetrics()
- action(SparkDatasetHelper.toArrowBatchRdd(df))
- }
+ executeCollect(convertComplexType(resultDF.limit(maxRows)))
}
override protected def isArrowBasedOperation: Boolean = true
@@ -222,7 +205,19 @@ class ArrowBasedExecuteStatement(
override val resultFormat = "arrow"
private def convertComplexType(df: DataFrame): DataFrame = {
- SparkDatasetHelper.convertTopLevelComplexTypeToHiveString(df, timestampAsString)
+ convertTopLevelComplexTypeToHiveString(df, timestampAsString)
}
+ def checkUseLargeVarType(): Unit = {
+ // TODO: largeVarType support, see SPARK-39979.
+ val useLargeVarType = session.asInstanceOf[SparkSessionImpl].spark
+ .conf
+ .get("spark.sql.execution.arrow.useLargeVarType", "false")
+ .toBoolean
+ if (useLargeVarType) {
+ throw new KyuubiSQLException(
+ "`spark.sql.execution.arrow.useLargeVarType = true` not support now.",
+ null)
+ }
+ }
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCatalogs.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCatalogs.scala
index 6d818e53ed7..c8e58730096 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCatalogs.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCatalogs.scala
@@ -19,7 +19,7 @@ package org.apache.kyuubi.engine.spark.operation
import org.apache.spark.sql.types.StructType
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
import org.apache.kyuubi.operation.IterableFetchIterator
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant.TABLE_CAT
import org.apache.kyuubi.session.Session
@@ -33,7 +33,7 @@ class GetCatalogs(session: Session) extends SparkOperation(session) {
override protected def runInternal(): Unit = {
try {
- iter = new IterableFetchIterator(SparkCatalogShim().getCatalogs(spark).toList)
+ iter = new IterableFetchIterator(SparkCatalogUtils.getCatalogs(spark))
} catch onError()
}
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetColumns.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetColumns.scala
index e785169812f..3a0ab7d5ba6 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetColumns.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetColumns.scala
@@ -19,7 +19,7 @@ package org.apache.kyuubi.engine.spark.operation
import org.apache.spark.sql.types._
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
import org.apache.kyuubi.operation.IterableFetchIterator
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant._
import org.apache.kyuubi.session.Session
@@ -115,8 +115,8 @@ class GetColumns(
val schemaPattern = toJavaRegex(schemaName)
val tablePattern = toJavaRegex(tableName)
val columnPattern = toJavaRegex(columnName)
- iter = new IterableFetchIterator(SparkCatalogShim()
- .getColumns(spark, catalogName, schemaPattern, tablePattern, columnPattern).toList)
+ iter = new IterableFetchIterator(SparkCatalogUtils
+ .getColumns(spark, catalogName, schemaPattern, tablePattern, columnPattern))
} catch {
onError()
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCurrentCatalog.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCurrentCatalog.scala
index 66d707ec033..1d85d3d5adc 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCurrentCatalog.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCurrentCatalog.scala
@@ -17,15 +17,20 @@
package org.apache.kyuubi.engine.spark.operation
+import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
import org.apache.kyuubi.operation.IterableFetchIterator
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant.TABLE_CAT
import org.apache.kyuubi.session.Session
class GetCurrentCatalog(session: Session) extends SparkOperation(session) {
+ private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def resultSchema: StructType = {
new StructType()
.add(TABLE_CAT, "string", nullable = true, "Catalog name.")
@@ -33,7 +38,8 @@ class GetCurrentCatalog(session: Session) extends SparkOperation(session) {
override protected def runInternal(): Unit = {
try {
- iter = new IterableFetchIterator(Seq(SparkCatalogShim().getCurrentCatalog(spark)))
+ val currentCatalogName = spark.sessionState.catalogManager.currentCatalog.name()
+ iter = new IterableFetchIterator(Seq(Row(currentCatalogName)))
} catch onError()
}
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCurrentDatabase.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCurrentDatabase.scala
index bcf3ad2a5f0..2478fb6a49a 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCurrentDatabase.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetCurrentDatabase.scala
@@ -17,15 +17,21 @@
package org.apache.kyuubi.engine.spark.operation
+import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils.quoteIfNeeded
import org.apache.kyuubi.operation.IterableFetchIterator
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant.TABLE_SCHEM
import org.apache.kyuubi.session.Session
class GetCurrentDatabase(session: Session) extends SparkOperation(session) {
+ private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def resultSchema: StructType = {
new StructType()
.add(TABLE_SCHEM, "string", nullable = true, "Schema name.")
@@ -33,7 +39,9 @@ class GetCurrentDatabase(session: Session) extends SparkOperation(session) {
override protected def runInternal(): Unit = {
try {
- iter = new IterableFetchIterator(Seq(SparkCatalogShim().getCurrentDatabase(spark)))
+ val currentDatabaseName =
+ spark.sessionState.catalogManager.currentNamespace.map(quoteIfNeeded).mkString(".")
+ iter = new IterableFetchIterator(Seq(Row(currentDatabaseName)))
} catch onError()
}
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetSchemas.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetSchemas.scala
index 3937f528d63..46dc7634acf 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetSchemas.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetSchemas.scala
@@ -19,7 +19,7 @@ package org.apache.kyuubi.engine.spark.operation
import org.apache.spark.sql.types.StructType
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
import org.apache.kyuubi.operation.IterableFetchIterator
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant._
import org.apache.kyuubi.session.Session
@@ -40,7 +40,7 @@ class GetSchemas(session: Session, catalogName: String, schema: String)
override protected def runInternal(): Unit = {
try {
val schemaPattern = toJavaRegex(schema)
- val rows = SparkCatalogShim().getSchemas(spark, catalogName, schemaPattern)
+ val rows = SparkCatalogUtils.getSchemas(spark, catalogName, schemaPattern)
iter = new IterableFetchIterator(rows)
} catch onError()
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetTableTypes.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetTableTypes.scala
index 1d2cae3815f..1029175b21f 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetTableTypes.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetTableTypes.scala
@@ -20,7 +20,7 @@ package org.apache.kyuubi.engine.spark.operation
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
import org.apache.kyuubi.operation.IterableFetchIterator
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant._
import org.apache.kyuubi.session.Session
@@ -33,6 +33,6 @@ class GetTableTypes(session: Session)
}
override protected def runInternal(): Unit = {
- iter = new IterableFetchIterator(SparkCatalogShim.sparkTableTypes.map(Row(_)).toList)
+ iter = new IterableFetchIterator(SparkCatalogUtils.sparkTableTypes.map(Row(_)).toList)
}
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetTables.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetTables.scala
index 40642b825b9..980e4fdb173 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetTables.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/GetTables.scala
@@ -20,7 +20,7 @@ package org.apache.kyuubi.engine.spark.operation
import org.apache.spark.sql.types.StructType
import org.apache.kyuubi.config.KyuubiConf.OPERATION_GET_TABLES_IGNORE_TABLE_PROPERTIES
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
import org.apache.kyuubi.operation.IterableFetchIterator
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant._
import org.apache.kyuubi.session.Session
@@ -73,9 +73,8 @@ class GetTables(
try {
val schemaPattern = toJavaRegex(schema)
val tablePattern = toJavaRegex(tableName)
- val sparkShim = SparkCatalogShim()
val catalogTablesAndViews =
- sparkShim.getCatalogTablesOrViews(
+ SparkCatalogUtils.getCatalogTablesOrViews(
spark,
catalog,
schemaPattern,
@@ -86,7 +85,7 @@ class GetTables(
val allTableAndViews =
if (tableTypes.exists("VIEW".equalsIgnoreCase)) {
catalogTablesAndViews ++
- sparkShim.getTempViews(spark, catalog, schemaPattern, tablePattern)
+ SparkCatalogUtils.getTempViews(spark, catalog, schemaPattern, tablePattern)
} else {
catalogTablesAndViews
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/PlanOnlyStatement.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/PlanOnlyStatement.scala
index b7e5451ece2..4f88083130a 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/PlanOnlyStatement.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/PlanOnlyStatement.scala
@@ -17,14 +17,17 @@
package org.apache.kyuubi.engine.spark.operation
-import org.apache.spark.sql.Row
+import com.fasterxml.jackson.databind.ObjectMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+import org.apache.spark.kyuubi.SparkUtilsHelper
+import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.StructType
import org.apache.kyuubi.KyuubiSQLException
-import org.apache.kyuubi.config.KyuubiConf.{OPERATION_PLAN_ONLY_EXCLUDES, OPERATION_PLAN_ONLY_OUT_STYLE}
-import org.apache.kyuubi.operation.{AnalyzeMode, ArrayFetchIterator, ExecutionMode, IterableFetchIterator, JsonStyle, OptimizeMode, OptimizeWithStatsMode, ParseMode, PhysicalMode, PlainStyle, PlanOnlyMode, PlanOnlyStyle, UnknownMode, UnknownStyle}
+import org.apache.kyuubi.config.KyuubiConf.{LINEAGE_PARSER_PLUGIN_PROVIDER, OPERATION_PLAN_ONLY_EXCLUDES, OPERATION_PLAN_ONLY_OUT_STYLE}
+import org.apache.kyuubi.operation.{AnalyzeMode, ArrayFetchIterator, ExecutionMode, IterableFetchIterator, JsonStyle, LineageMode, OperationHandle, OptimizeMode, OptimizeWithStatsMode, ParseMode, PhysicalMode, PlainStyle, PlanOnlyMode, PlanOnlyStyle, UnknownMode, UnknownStyle}
import org.apache.kyuubi.operation.PlanOnlyMode.{notSupportedModeError, unknownModeError}
import org.apache.kyuubi.operation.PlanOnlyStyle.{notSupportedStyleError, unknownStyleError}
import org.apache.kyuubi.operation.log.OperationLog
@@ -36,12 +39,13 @@ import org.apache.kyuubi.session.Session
class PlanOnlyStatement(
session: Session,
override val statement: String,
- mode: PlanOnlyMode)
+ mode: PlanOnlyMode,
+ override protected val handle: OperationHandle)
extends SparkOperation(session) {
private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
- private val planExcludes: Seq[String] = {
- spark.conf.getOption(OPERATION_PLAN_ONLY_EXCLUDES.key).map(_.split(",").map(_.trim).toSeq)
+ private val planExcludes: Set[String] = {
+ spark.conf.getOption(OPERATION_PLAN_ONLY_EXCLUDES.key).map(_.split(",").map(_.trim).toSet)
.getOrElse(session.sessionManager.getConf.get(OPERATION_PLAN_ONLY_EXCLUDES))
}
@@ -65,28 +69,29 @@ class PlanOnlyStatement(
super.beforeRun()
}
- override protected def runInternal(): Unit = withLocalProperties {
+ override protected def runInternal(): Unit =
try {
- SQLConf.withExistingConf(spark.sessionState.conf) {
- val parsed = spark.sessionState.sqlParser.parsePlan(statement)
-
- parsed match {
- case cmd if planExcludes.contains(cmd.getClass.getSimpleName) =>
- result = spark.sql(statement)
- iter = new ArrayFetchIterator(result.collect())
-
- case plan => style match {
- case PlainStyle => explainWithPlainStyle(plan)
- case JsonStyle => explainWithJsonStyle(plan)
- case UnknownStyle => unknownStyleError(style)
- case other => throw notSupportedStyleError(other, "Spark SQL")
- }
+ withLocalProperties {
+ SQLConf.withExistingConf(spark.sessionState.conf) {
+ val parsed = spark.sessionState.sqlParser.parsePlan(statement)
+
+ parsed match {
+ case cmd if planExcludes.contains(cmd.getClass.getSimpleName) =>
+ result = spark.sql(statement)
+ iter = new ArrayFetchIterator(result.collect())
+
+ case plan => style match {
+ case PlainStyle => explainWithPlainStyle(plan)
+ case JsonStyle => explainWithJsonStyle(plan)
+ case UnknownStyle => unknownStyleError(style)
+ case other => throw notSupportedStyleError(other, "Spark SQL")
+ }
+ }
}
}
} catch {
onError()
}
- }
private def explainWithPlainStyle(plan: LogicalPlan): Unit = {
mode match {
@@ -117,6 +122,9 @@ class PlanOnlyStatement(
case ExecutionMode =>
val executed = spark.sql(statement).queryExecution.executedPlan
iter = new IterableFetchIterator(Seq(Row(executed.toString())))
+ case LineageMode =>
+ val result = parseLineage(spark, plan)
+ iter = new IterableFetchIterator(Seq(Row(result)))
case UnknownMode => throw unknownModeError(mode)
case _ => throw notSupportedModeError(mode, "Spark SQL")
}
@@ -141,10 +149,39 @@ class PlanOnlyStatement(
case ExecutionMode =>
val executed = spark.sql(statement).queryExecution.executedPlan
iter = new IterableFetchIterator(Seq(Row(executed.toJSON)))
+ case LineageMode =>
+ val result = parseLineage(spark, plan)
+ iter = new IterableFetchIterator(Seq(Row(result)))
case UnknownMode => throw unknownModeError(mode)
case _ =>
throw KyuubiSQLException(s"The operation mode $mode" +
" doesn't support in Spark SQL engine.")
}
}
+
+ private def parseLineage(spark: SparkSession, plan: LogicalPlan): String = {
+ val analyzed = spark.sessionState.analyzer.execute(plan)
+ spark.sessionState.analyzer.checkAnalysis(analyzed)
+ val optimized = spark.sessionState.optimizer.execute(analyzed)
+ val parserProviderClass = session.sessionManager.getConf.get(LINEAGE_PARSER_PLUGIN_PROVIDER)
+
+ try {
+ if (!SparkUtilsHelper.classesArePresent(
+ parserProviderClass)) {
+ throw new Exception(s"'$parserProviderClass' not found," +
+ " need to install kyuubi-spark-lineage plugin before using the 'lineage' mode")
+ }
+
+ val lineage = Class.forName(parserProviderClass)
+ .getMethod("parse", classOf[SparkSession], classOf[LogicalPlan])
+ .invoke(null, spark, optimized)
+
+ val mapper = new ObjectMapper().registerModule(DefaultScalaModule)
+ mapper.writeValueAsString(lineage)
+ } catch {
+ case e: Throwable =>
+ throw KyuubiSQLException(s"Extract columns lineage failed: ${e.getMessage}", e)
+ }
+ }
+
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SetCurrentCatalog.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SetCurrentCatalog.scala
index 4e8c0aa69a4..88105b086a9 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SetCurrentCatalog.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SetCurrentCatalog.scala
@@ -19,18 +19,23 @@ package org.apache.kyuubi.engine.spark.operation
import org.apache.spark.sql.types.StructType
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
class SetCurrentCatalog(session: Session, catalog: String) extends SparkOperation(session) {
+ private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def resultSchema: StructType = {
new StructType()
}
override protected def runInternal(): Unit = {
try {
- SparkCatalogShim().setCurrentCatalog(spark, catalog)
+ SparkCatalogUtils.setCurrentCatalog(spark, catalog)
setHasResultSet(false)
} catch onError()
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SetCurrentDatabase.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SetCurrentDatabase.scala
index 0a21bc83965..d227f5fd2ad 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SetCurrentDatabase.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SetCurrentDatabase.scala
@@ -19,19 +19,23 @@ package org.apache.kyuubi.engine.spark.operation
import org.apache.spark.sql.types.StructType
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
class SetCurrentDatabase(session: Session, database: String)
extends SparkOperation(session) {
+ private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def resultSchema: StructType = {
new StructType()
}
override protected def runInternal(): Unit = {
try {
- SparkCatalogShim().setCurrentDatabase(spark, database)
+ spark.sessionState.catalogManager.setCurrentNamespace(Array(database))
setHasResultSet(false)
} catch onError()
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkOperation.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkOperation.scala
index eb58407d47c..1de360f0715 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkOperation.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkOperation.scala
@@ -20,7 +20,7 @@ package org.apache.kyuubi.engine.spark.operation
import java.io.IOException
import java.time.ZoneId
-import org.apache.hive.service.rpc.thrift.{TGetResultSetMetadataResp, TProgressUpdateResp, TRowSet}
+import org.apache.hive.service.rpc.thrift.{TFetchResultsResp, TGetResultSetMetadataResp, TProgressUpdateResp, TRowSet}
import org.apache.spark.kyuubi.{SparkProgressMonitor, SQLOperationListener}
import org.apache.spark.kyuubi.SparkUtilsHelper.redact
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
@@ -101,13 +101,13 @@ abstract class SparkOperation(session: Session)
super.getStatus
}
- override def cleanup(targetState: OperationState): Unit = state.synchronized {
+ override def cleanup(targetState: OperationState): Unit = withLockRequired {
operationListener.foreach(_.cleanup())
if (!isTerminalState(state)) {
setState(targetState)
Option(getBackgroundHandle).foreach(_.cancel(true))
- if (!spark.sparkContext.isStopped) spark.sparkContext.cancelJobGroup(statementId)
}
+ if (!spark.sparkContext.isStopped) spark.sparkContext.cancelJobGroup(statementId)
}
protected val forceCancel =
@@ -174,15 +174,16 @@ abstract class SparkOperation(session: Session)
// could be thrown.
case e: Throwable =>
if (cancel && !spark.sparkContext.isStopped) spark.sparkContext.cancelJobGroup(statementId)
- state.synchronized {
+ withLockRequired {
val errMsg = Utils.stringifyException(e)
if (state == OperationState.TIMEOUT) {
val ke = KyuubiSQLException(s"Timeout operating $opType: $errMsg")
setOperationException(ke)
throw ke
} else if (isTerminalState(state)) {
- setOperationException(KyuubiSQLException(errMsg))
- warn(s"Ignore exception in terminal state with $statementId: $errMsg")
+ val ke = KyuubiSQLException(errMsg)
+ setOperationException(ke)
+ throw ke
} else {
error(s"Error operating $opType: $errMsg", e)
val ke = KyuubiSQLException(s"Error operating $opType: $errMsg", e)
@@ -200,7 +201,7 @@ abstract class SparkOperation(session: Session)
}
override protected def afterRun(): Unit = {
- state.synchronized {
+ withLockRequired {
if (!isTerminalState(state)) {
setState(OperationState.FINISHED)
}
@@ -232,10 +233,12 @@ abstract class SparkOperation(session: Session)
resp
}
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet =
- withLocalProperties {
- var resultRowSet: TRowSet = null
- try {
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
+ var resultRowSet: TRowSet = null
+ try {
+ withLocalProperties {
validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
setHasResultSet(true)
@@ -260,10 +263,14 @@ abstract class SparkOperation(session: Session)
getProtocolVersion)
}
resultRowSet.setStartRowOffset(iter.getPosition)
- } catch onError(cancel = true)
+ }
+ } catch onError(cancel = true)
- resultRowSet
- }
+ val resp = new TFetchResultsResp(OK_STATUS)
+ resp.setResults(resultRowSet)
+ resp.setHasMoreRows(false)
+ resp
+ }
override def shouldRunAsync: Boolean = false
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkSQLOperationManager.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkSQLOperationManager.scala
index 8fd58b33875..ab082874630 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkSQLOperationManager.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/operation/SparkSQLOperationManager.scala
@@ -26,7 +26,7 @@ import org.apache.kyuubi.config.KyuubiConf._
import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_OPERATION_HANDLE_KEY
import org.apache.kyuubi.engine.spark.repl.KyuubiSparkILoop
import org.apache.kyuubi.engine.spark.session.SparkSessionImpl
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
import org.apache.kyuubi.operation.{NoneMode, Operation, OperationHandle, OperationManager, PlanOnlyMode}
import org.apache.kyuubi.session.{Session, SessionHandle}
@@ -106,18 +106,18 @@ class SparkSQLOperationManager private (name: String) extends OperationManager(n
opHandle)
}
case mode =>
- new PlanOnlyStatement(session, statement, mode)
+ new PlanOnlyStatement(session, statement, mode, opHandle)
}
case OperationLanguages.SCALA =>
val repl = sessionToRepl.getOrElseUpdate(session.handle, KyuubiSparkILoop(spark))
- new ExecuteScala(session, repl, statement, runAsync, queryTimeout)
+ new ExecuteScala(session, repl, statement, runAsync, queryTimeout, opHandle)
case OperationLanguages.PYTHON =>
try {
ExecutePython.init()
val worker = sessionToPythonProcess.getOrElseUpdate(
session.handle,
ExecutePython.createSessionPythonWorker(spark, session))
- new ExecutePython(session, statement, runAsync, queryTimeout, worker)
+ new ExecutePython(session, statement, runAsync, queryTimeout, worker, opHandle)
} catch {
case e: Throwable =>
spark.conf.set(OPERATION_LANGUAGE.key, OperationLanguages.SQL.toString)
@@ -179,7 +179,7 @@ class SparkSQLOperationManager private (name: String) extends OperationManager(n
tableTypes: java.util.List[String]): Operation = {
val tTypes =
if (tableTypes == null || tableTypes.isEmpty) {
- SparkCatalogShim.sparkTableTypes
+ SparkCatalogUtils.sparkTableTypes
} else {
tableTypes.asScala.toSet
}
@@ -231,6 +231,6 @@ class SparkSQLOperationManager private (name: String) extends OperationManager(n
}
override def getQueryId(operation: Operation): String = {
- throw KyuubiSQLException.featureNotSupported()
+ operation.getHandle.identifier.toString
}
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/session/SparkSessionImpl.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/session/SparkSessionImpl.scala
index 78164ff5fab..8d9012cbdc6 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/session/SparkSessionImpl.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/session/SparkSessionImpl.scala
@@ -17,6 +17,7 @@
package org.apache.kyuubi.engine.spark.session
+import org.apache.commons.lang3.StringUtils
import org.apache.hive.service.rpc.thrift.{TGetInfoType, TGetInfoValue, TProtocolVersion}
import org.apache.spark.sql.{AnalysisException, SparkSession}
@@ -24,11 +25,11 @@ import org.apache.kyuubi.KyuubiSQLException
import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_HANDLE_KEY
import org.apache.kyuubi.engine.spark.events.SessionEvent
import org.apache.kyuubi.engine.spark.operation.SparkSQLOperationManager
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
import org.apache.kyuubi.engine.spark.udf.KDFRegistry
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
import org.apache.kyuubi.events.EventBus
import org.apache.kyuubi.operation.{Operation, OperationHandle}
-import org.apache.kyuubi.session.{AbstractSession, SessionHandle, SessionManager}
+import org.apache.kyuubi.session._
class SparkSessionImpl(
protocol: TProtocolVersion,
@@ -54,22 +55,35 @@ class SparkSessionImpl(
private val sessionEvent = SessionEvent(this)
override def open(): Unit = {
- normalizedConf.foreach {
- case ("use:catalog", catalog) =>
- try {
- SparkCatalogShim().setCurrentCatalog(spark, catalog)
- } catch {
- case e if e.getMessage.contains("Cannot find catalog plugin class for catalog") =>
- warn(e.getMessage())
- }
- case ("use:database", database) =>
- try {
- SparkCatalogShim().setCurrentDatabase(spark, database)
- } catch {
- case e
- if database == "default" && e.getMessage != null &&
- e.getMessage.contains("not found") =>
- }
+
+ val (useCatalogAndDatabaseConf, otherConf) = normalizedConf.partition { case (k, _) =>
+ Array(USE_CATALOG, USE_DATABASE).contains(k)
+ }
+
+ useCatalogAndDatabaseConf.get(USE_CATALOG).foreach { catalog =>
+ try {
+ SparkCatalogUtils.setCurrentCatalog(spark, catalog)
+ } catch {
+ case e if e.getMessage.contains("Cannot find catalog plugin class for catalog") =>
+ warn(e.getMessage())
+ }
+ }
+
+ useCatalogAndDatabaseConf.get("use:database").foreach { database =>
+ try {
+ spark.sessionState.catalogManager.setCurrentNamespace(Array(database))
+ } catch {
+ case e
+ if database == "default" &&
+ StringUtils.containsAny(
+ e.getMessage,
+ "not found",
+ "SCHEMA_NOT_FOUND",
+ "is not authorized to perform: glue:GetDatabase") =>
+ }
+ }
+
+ otherConf.foreach {
case (key, value) => setModifiableConfig(key, value)
}
KDFRegistry.registerAll(spark)
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/shim/CatalogShim_v2_4.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/shim/CatalogShim_v2_4.scala
deleted file mode 100644
index ea72dd1563c..00000000000
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/shim/CatalogShim_v2_4.scala
+++ /dev/null
@@ -1,184 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.engine.spark.shim
-
-import java.util.regex.Pattern
-
-import org.apache.spark.sql.{Row, SparkSession}
-import org.apache.spark.sql.catalyst.TableIdentifier
-
-class CatalogShim_v2_4 extends SparkCatalogShim {
-
- override def getCatalogs(spark: SparkSession): Seq[Row] = {
- Seq(Row(SparkCatalogShim.SESSION_CATALOG))
- }
-
- override protected def catalogExists(spark: SparkSession, catalog: String): Boolean = false
-
- override def setCurrentCatalog(spark: SparkSession, catalog: String): Unit = {}
-
- override def getCurrentCatalog(spark: SparkSession): Row = {
- Row(SparkCatalogShim.SESSION_CATALOG)
- }
-
- override def getSchemas(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String): Seq[Row] = {
- (spark.sessionState.catalog.listDatabases(schemaPattern) ++
- getGlobalTempViewManager(spark, schemaPattern)).map(Row(_, SparkCatalogShim.SESSION_CATALOG))
- }
-
- def setCurrentDatabase(spark: SparkSession, databaseName: String): Unit = {
- spark.sessionState.catalog.setCurrentDatabase(databaseName)
- }
-
- def getCurrentDatabase(spark: SparkSession): Row = {
- Row(spark.sessionState.catalog.getCurrentDatabase)
- }
-
- override protected def getGlobalTempViewManager(
- spark: SparkSession,
- schemaPattern: String): Seq[String] = {
- val database = spark.sharedState.globalTempViewManager.database
- Option(database).filter(_.matches(schemaPattern)).toSeq
- }
-
- override def getCatalogTablesOrViews(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String,
- tablePattern: String,
- tableTypes: Set[String],
- ignoreTableProperties: Boolean): Seq[Row] = {
- val catalog = spark.sessionState.catalog
- val databases = catalog.listDatabases(schemaPattern)
-
- databases.flatMap { db =>
- val identifiers = catalog.listTables(db, tablePattern, includeLocalTempViews = false)
- catalog.getTablesByName(identifiers)
- .filter(t => matched(tableTypes, t.tableType.name)).map { t =>
- val typ = if (t.tableType.name == "VIEW") "VIEW" else "TABLE"
- Row(
- catalogName,
- t.database,
- t.identifier.table,
- typ,
- t.comment.getOrElse(""),
- null,
- null,
- null,
- null,
- null)
- }
- }
- }
-
- override def getTempViews(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String,
- tablePattern: String): Seq[Row] = {
- val views = getViews(spark, schemaPattern, tablePattern)
- views.map { ident =>
- Row(catalogName, ident.database.orNull, ident.table, "VIEW", "", null, null, null, null, null)
- }
- }
-
- override protected def getViews(
- spark: SparkSession,
- schemaPattern: String,
- tablePattern: String): Seq[TableIdentifier] = {
- val db = getGlobalTempViewManager(spark, schemaPattern)
- if (db.nonEmpty) {
- spark.sessionState.catalog.listTables(db.head, tablePattern)
- } else {
- spark.sessionState.catalog.listLocalTempViews(tablePattern)
- }
- }
-
- override def getColumns(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String,
- tablePattern: String,
- columnPattern: String): Seq[Row] = {
-
- val cp = columnPattern.r.pattern
- val byCatalog = getColumnsByCatalog(spark, catalogName, schemaPattern, tablePattern, cp)
- val byGlobalTmpDB = getColumnsByGlobalTempViewManager(spark, schemaPattern, tablePattern, cp)
- val byLocalTmp = getColumnsByLocalTempViews(spark, tablePattern, cp)
-
- byCatalog ++ byGlobalTmpDB ++ byLocalTmp
- }
-
- protected def getColumnsByCatalog(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String,
- tablePattern: String,
- columnPattern: Pattern): Seq[Row] = {
- val catalog = spark.sessionState.catalog
-
- val databases = catalog.listDatabases(schemaPattern)
-
- databases.flatMap { db =>
- val identifiers = catalog.listTables(db, tablePattern, includeLocalTempViews = true)
- catalog.getTablesByName(identifiers).flatMap { t =>
- t.schema.zipWithIndex.filter(f => columnPattern.matcher(f._1.name).matches())
- .map { case (f, i) => toColumnResult(catalogName, t.database, t.identifier.table, f, i) }
- }
- }
- }
-
- protected def getColumnsByGlobalTempViewManager(
- spark: SparkSession,
- schemaPattern: String,
- tablePattern: String,
- columnPattern: Pattern): Seq[Row] = {
- val catalog = spark.sessionState.catalog
-
- getGlobalTempViewManager(spark, schemaPattern).flatMap { globalTmpDb =>
- catalog.globalTempViewManager.listViewNames(tablePattern).flatMap { v =>
- catalog.globalTempViewManager.get(v).map { plan =>
- plan.schema.zipWithIndex.filter(f => columnPattern.matcher(f._1.name).matches())
- .map { case (f, i) =>
- toColumnResult(SparkCatalogShim.SESSION_CATALOG, globalTmpDb, v, f, i)
- }
- }
- }.flatten
- }
- }
-
- protected def getColumnsByLocalTempViews(
- spark: SparkSession,
- tablePattern: String,
- columnPattern: Pattern): Seq[Row] = {
- val catalog = spark.sessionState.catalog
-
- catalog.listLocalTempViews(tablePattern)
- .map(v => (v, catalog.getTempView(v.table).get))
- .flatMap { case (v, plan) =>
- plan.schema.zipWithIndex
- .filter(f => columnPattern.matcher(f._1.name).matches())
- .map { case (f, i) =>
- toColumnResult(SparkCatalogShim.SESSION_CATALOG, null, v.table, f, i)
- }
- }
- }
-}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/shim/CatalogShim_v3_0.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/shim/CatalogShim_v3_0.scala
deleted file mode 100644
index 27c524f3032..00000000000
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/shim/CatalogShim_v3_0.scala
+++ /dev/null
@@ -1,216 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.engine.spark.shim
-
-import java.util.regex.Pattern
-
-import org.apache.spark.sql.{Row, SparkSession}
-import org.apache.spark.sql.connector.catalog.{CatalogExtension, CatalogPlugin, SupportsNamespaces, TableCatalog}
-
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim.SESSION_CATALOG
-
-class CatalogShim_v3_0 extends CatalogShim_v2_4 {
-
- override def getCatalogs(spark: SparkSession): Seq[Row] = {
-
- // A [[CatalogManager]] is session unique
- val catalogMgr = spark.sessionState.catalogManager
- // get the custom v2 session catalog or default spark_catalog
- val sessionCatalog = invoke(catalogMgr, "v2SessionCatalog")
- val defaultCatalog = catalogMgr.currentCatalog
-
- val defaults = Seq(sessionCatalog, defaultCatalog).distinct
- .map(invoke(_, "name").asInstanceOf[String])
- val catalogs = getField(catalogMgr, "catalogs")
- .asInstanceOf[scala.collection.Map[String, _]]
- (catalogs.keys ++: defaults).distinct.map(Row(_))
- }
-
- private def getCatalog(spark: SparkSession, catalogName: String): CatalogPlugin = {
- val catalogManager = spark.sessionState.catalogManager
- if (catalogName == null || catalogName.isEmpty) {
- catalogManager.currentCatalog
- } else {
- catalogManager.catalog(catalogName)
- }
- }
-
- override def catalogExists(spark: SparkSession, catalog: String): Boolean = {
- spark.sessionState.catalogManager.isCatalogRegistered(catalog)
- }
-
- override def setCurrentCatalog(spark: SparkSession, catalog: String): Unit = {
- // SPARK-36841(3.3.0) Ensure setCurrentCatalog method catalog must exist
- if (spark.sessionState.catalogManager.isCatalogRegistered(catalog)) {
- spark.sessionState.catalogManager.setCurrentCatalog(catalog)
- } else {
- throw new IllegalArgumentException(s"Cannot find catalog plugin class for catalog '$catalog'")
- }
- }
-
- override def getCurrentCatalog(spark: SparkSession): Row = {
- Row(spark.sessionState.catalogManager.currentCatalog.name())
- }
-
- private def listAllNamespaces(
- catalog: SupportsNamespaces,
- namespaces: Array[Array[String]]): Array[Array[String]] = {
- val children = namespaces.flatMap { ns =>
- catalog.listNamespaces(ns)
- }
- if (children.isEmpty) {
- namespaces
- } else {
- namespaces ++: listAllNamespaces(catalog, children)
- }
- }
-
- private def listAllNamespaces(catalog: CatalogPlugin): Array[Array[String]] = {
- catalog match {
- case catalog: CatalogExtension =>
- // DSv2 does not support pass schemaPattern transparently
- catalog.defaultNamespace() +: catalog.listNamespaces(Array())
- case catalog: SupportsNamespaces =>
- val rootSchema = catalog.listNamespaces()
- val allSchemas = listAllNamespaces(catalog, rootSchema)
- allSchemas
- }
- }
-
- /**
- * Forked from Apache Spark's org.apache.spark.sql.connector.catalog.CatalogV2Implicits
- */
- private def quoteIfNeeded(part: String): String = {
- if (part.contains(".") || part.contains("`")) {
- s"`${part.replace("`", "``")}`"
- } else {
- part
- }
- }
-
- private def listNamespacesWithPattern(
- catalog: CatalogPlugin,
- schemaPattern: String): Array[Array[String]] = {
- val p = schemaPattern.r.pattern
- listAllNamespaces(catalog).filter { ns =>
- val quoted = ns.map(quoteIfNeeded).mkString(".")
- p.matcher(quoted).matches()
- }.distinct
- }
-
- private def getSchemasWithPattern(catalog: CatalogPlugin, schemaPattern: String): Seq[String] = {
- val p = schemaPattern.r.pattern
- listAllNamespaces(catalog).flatMap { ns =>
- val quoted = ns.map(quoteIfNeeded).mkString(".")
- if (p.matcher(quoted).matches()) {
- Some(quoted)
- } else {
- None
- }
- }.distinct
- }
-
- override def getSchemas(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String): Seq[Row] = {
- if (catalogName == SparkCatalogShim.SESSION_CATALOG) {
- super.getSchemas(spark, catalogName, schemaPattern)
- } else {
- val catalog = getCatalog(spark, catalogName)
- getSchemasWithPattern(catalog, schemaPattern).map(Row(_, catalog.name))
- }
- }
-
- override def setCurrentDatabase(spark: SparkSession, databaseName: String): Unit = {
- spark.sessionState.catalogManager.setCurrentNamespace(Array(databaseName))
- }
-
- override def getCurrentDatabase(spark: SparkSession): Row = {
- Row(spark.sessionState.catalogManager.currentNamespace.map(quoteIfNeeded).mkString("."))
- }
-
- override def getCatalogTablesOrViews(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String,
- tablePattern: String,
- tableTypes: Set[String],
- ignoreTableProperties: Boolean = false): Seq[Row] = {
- val catalog = getCatalog(spark, catalogName)
- val namespaces = listNamespacesWithPattern(catalog, schemaPattern)
- catalog match {
- case builtin if builtin.name() == SESSION_CATALOG =>
- super.getCatalogTablesOrViews(
- spark,
- SESSION_CATALOG,
- schemaPattern,
- tablePattern,
- tableTypes,
- ignoreTableProperties)
- case tc: TableCatalog =>
- val tp = tablePattern.r.pattern
- val identifiers = namespaces.flatMap { ns =>
- tc.listTables(ns).filter(i => tp.matcher(quoteIfNeeded(i.name())).matches())
- }
- identifiers.map { ident =>
- // TODO: restore view type for session catalog
- val comment = if (ignoreTableProperties) ""
- else tc.loadTable(ident).properties().getOrDefault(TableCatalog.PROP_COMMENT, "")
- val schema = ident.namespace().map(quoteIfNeeded).mkString(".")
- val tableName = quoteIfNeeded(ident.name())
- Row(catalog.name(), schema, tableName, "TABLE", comment, null, null, null, null, null)
- }
- case _ => Seq.empty[Row]
- }
- }
-
- override protected def getColumnsByCatalog(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String,
- tablePattern: String,
- columnPattern: Pattern): Seq[Row] = {
- val catalog = getCatalog(spark, catalogName)
-
- catalog match {
- case tc: TableCatalog =>
- val namespaces = listNamespacesWithPattern(catalog, schemaPattern)
- val tp = tablePattern.r.pattern
- val identifiers = namespaces.flatMap { ns =>
- tc.listTables(ns).filter(i => tp.matcher(quoteIfNeeded(i.name())).matches())
- }
- identifiers.flatMap { ident =>
- val table = tc.loadTable(ident)
- val namespace = ident.namespace().map(quoteIfNeeded).mkString(".")
- val tableName = quoteIfNeeded(ident.name())
-
- table.schema.zipWithIndex.filter(f => columnPattern.matcher(f._1.name).matches())
- .map { case (f, i) => toColumnResult(tc.name(), namespace, tableName, f, i) }
- }
-
- case builtin if builtin.name() == SESSION_CATALOG =>
- super.getColumnsByCatalog(
- spark,
- SESSION_CATALOG,
- schemaPattern,
- tablePattern,
- columnPattern)
- }
- }
-}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/shim/SparkCatalogShim.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/shim/SparkCatalogShim.scala
deleted file mode 100644
index 83c80652380..00000000000
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/shim/SparkCatalogShim.scala
+++ /dev/null
@@ -1,183 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.kyuubi.engine.spark.shim
-
-import org.apache.spark.sql.{Row, SparkSession}
-import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.types.StructField
-
-import org.apache.kyuubi.Logging
-import org.apache.kyuubi.engine.spark.KyuubiSparkUtil.sparkMajorMinorVersion
-import org.apache.kyuubi.engine.spark.schema.SchemaHelper
-
-/**
- * A shim that defines the interface interact with Spark's catalogs
- */
-trait SparkCatalogShim extends Logging {
-
- // ///////////////////////////////////////////////////////////////////////////////////////////////
- // Catalog //
- // ///////////////////////////////////////////////////////////////////////////////////////////////
-
- /**
- * Get all register catalogs in Spark's `CatalogManager`
- */
- def getCatalogs(spark: SparkSession): Seq[Row]
-
- protected def catalogExists(spark: SparkSession, catalog: String): Boolean
-
- def setCurrentCatalog(spark: SparkSession, catalog: String): Unit
-
- def getCurrentCatalog(spark: SparkSession): Row
-
- // ///////////////////////////////////////////////////////////////////////////////////////////////
- // Schema //
- // ///////////////////////////////////////////////////////////////////////////////////////////////
-
- /**
- * a list of [[Row]]s, with 2 fields `schemaName: String, catalogName: String`
- */
- def getSchemas(spark: SparkSession, catalogName: String, schemaPattern: String): Seq[Row]
-
- def setCurrentDatabase(spark: SparkSession, databaseName: String): Unit
-
- def getCurrentDatabase(spark: SparkSession): Row
-
- protected def getGlobalTempViewManager(spark: SparkSession, schemaPattern: String): Seq[String]
-
- // ///////////////////////////////////////////////////////////////////////////////////////////////
- // Table & View //
- // ///////////////////////////////////////////////////////////////////////////////////////////////
-
- def getCatalogTablesOrViews(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String,
- tablePattern: String,
- tableTypes: Set[String],
- ignoreTableProperties: Boolean): Seq[Row]
-
- def getTempViews(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String,
- tablePattern: String): Seq[Row]
-
- protected def getViews(
- spark: SparkSession,
- schemaPattern: String,
- tablePattern: String): Seq[TableIdentifier]
-
- // ///////////////////////////////////////////////////////////////////////////////////////////////
- // Columns //
- // ///////////////////////////////////////////////////////////////////////////////////////////////
-
- def getColumns(
- spark: SparkSession,
- catalogName: String,
- schemaPattern: String,
- tablePattern: String,
- columnPattern: String): Seq[Row]
-
- protected def toColumnResult(
- catalog: String,
- db: String,
- table: String,
- col: StructField,
- pos: Int): Row = {
- // format: off
- Row(
- catalog, // TABLE_CAT
- db, // TABLE_SCHEM
- table, // TABLE_NAME
- col.name, // COLUMN_NAME
- SchemaHelper.toJavaSQLType(col.dataType), // DATA_TYPE
- col.dataType.sql, // TYPE_NAME
- SchemaHelper.getColumnSize(col.dataType).orNull, // COLUMN_SIZE
- null, // BUFFER_LENGTH
- SchemaHelper.getDecimalDigits(col.dataType).orNull, // DECIMAL_DIGITS
- SchemaHelper.getNumPrecRadix(col.dataType).orNull, // NUM_PREC_RADIX
- if (col.nullable) 1 else 0, // NULLABLE
- col.getComment().getOrElse(""), // REMARKS
- null, // COLUMN_DEF
- null, // SQL_DATA_TYPE
- null, // SQL_DATETIME_SUB
- null, // CHAR_OCTET_LENGTH
- pos, // ORDINAL_POSITION
- "YES", // IS_NULLABLE
- null, // SCOPE_CATALOG
- null, // SCOPE_SCHEMA
- null, // SCOPE_TABLE
- null, // SOURCE_DATA_TYPE
- "NO" // IS_AUTO_INCREMENT
- )
- // format: on
- }
-
- // ///////////////////////////////////////////////////////////////////////////////////////////////
- // Miscellaneous //
- // ///////////////////////////////////////////////////////////////////////////////////////////////
-
- protected def invoke(
- obj: Any,
- methodName: String,
- args: (Class[_], AnyRef)*): Any = {
- val (types, values) = args.unzip
- val method = obj.getClass.getMethod(methodName, types: _*)
- method.setAccessible(true)
- method.invoke(obj, values.toSeq: _*)
- }
-
- protected def invoke(
- clazz: Class[_],
- obj: AnyRef,
- methodName: String,
- args: (Class[_], AnyRef)*): AnyRef = {
- val (types, values) = args.unzip
- val method = clazz.getMethod(methodName, types: _*)
- method.setAccessible(true)
- method.invoke(obj, values.toSeq: _*)
- }
-
- protected def getField(o: Any, fieldName: String): Any = {
- val field = o.getClass.getDeclaredField(fieldName)
- field.setAccessible(true)
- field.get(o)
- }
-
- protected def matched(tableTypes: Set[String], tableType: String): Boolean = {
- val typ = if (tableType.equalsIgnoreCase("VIEW")) "VIEW" else "TABLE"
- tableTypes.exists(typ.equalsIgnoreCase)
- }
-
-}
-
-object SparkCatalogShim {
- def apply(): SparkCatalogShim = {
- sparkMajorMinorVersion match {
- case (3, _) => new CatalogShim_v3_0
- case (2, _) => new CatalogShim_v2_4
- case _ =>
- throw new IllegalArgumentException(s"Not Support spark version $sparkMajorMinorVersion")
- }
- }
-
- val SESSION_CATALOG: String = "spark_catalog"
-
- val sparkTableTypes = Set("VIEW", "TABLE")
-}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/udf/KDFRegistry.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/udf/KDFRegistry.scala
index f4612a3d0a3..a2d50d1515b 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/udf/KDFRegistry.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/udf/KDFRegistry.scala
@@ -25,7 +25,7 @@ import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions.udf
import org.apache.kyuubi.{KYUUBI_VERSION, Utils}
-import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY
+import org.apache.kyuubi.config.KyuubiReservedKeys.{KYUUBI_ENGINE_URL, KYUUBI_SESSION_USER_KEY}
object KDFRegistry {
@@ -73,6 +73,16 @@ object KDFRegistry {
"string",
"1.4.0")
+ val engine_url: KyuubiDefinedFunction = create(
+ "engine_url",
+ udf { () =>
+ Option(TaskContext.get()).map(_.getLocalProperty(KYUUBI_ENGINE_URL))
+ .getOrElse(throw new RuntimeException("Unable to get engine url"))
+ },
+ "Return the engine url for the associated query engine",
+ "string",
+ "1.8.0")
+
def create(
name: String,
udf: UserDefinedFunction,
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/udf/KyuubiDefinedFunction.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/udf/KyuubiDefinedFunction.scala
index 30228bf7264..6bc2e3ddb3e 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/udf/KyuubiDefinedFunction.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/udf/KyuubiDefinedFunction.scala
@@ -20,7 +20,7 @@ package org.apache.kyuubi.engine.spark.udf
import org.apache.spark.sql.expressions.UserDefinedFunction
/**
- * A wrapper for Spark' [[UserDefinedFunction]]
+ * A wrapper for Spark's [[UserDefinedFunction]]
*
* @param name function name
* @param udf user-defined function
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/util/SparkCatalogUtils.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/util/SparkCatalogUtils.scala
new file mode 100644
index 00000000000..18a14494e85
--- /dev/null
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/util/SparkCatalogUtils.scala
@@ -0,0 +1,373 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine.spark.util
+
+import java.util.regex.Pattern
+
+import org.apache.commons.lang3.StringUtils
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.connector.catalog.{CatalogExtension, CatalogPlugin, SupportsNamespaces, TableCatalog}
+import org.apache.spark.sql.types.StructField
+
+import org.apache.kyuubi.Logging
+import org.apache.kyuubi.engine.spark.schema.SchemaHelper
+import org.apache.kyuubi.util.reflect.ReflectUtils._
+
+/**
+ * A shim that defines the interface interact with Spark's catalogs
+ */
+object SparkCatalogUtils extends Logging {
+
+ private val VIEW = "VIEW"
+ private val TABLE = "TABLE"
+
+ val SESSION_CATALOG: String = "spark_catalog"
+ val sparkTableTypes: Set[String] = Set(VIEW, TABLE)
+
+ // ///////////////////////////////////////////////////////////////////////////////////////////////
+ // Catalog //
+ // ///////////////////////////////////////////////////////////////////////////////////////////////
+
+ /**
+ * Get all register catalogs in Spark's `CatalogManager`
+ */
+ def getCatalogs(spark: SparkSession): Seq[Row] = {
+
+ // A [[CatalogManager]] is session unique
+ val catalogMgr = spark.sessionState.catalogManager
+ // get the custom v2 session catalog or default spark_catalog
+ val sessionCatalog = invokeAs[AnyRef](catalogMgr, "v2SessionCatalog")
+ val defaultCatalog = catalogMgr.currentCatalog
+
+ val defaults = Seq(sessionCatalog, defaultCatalog).distinct.map(invokeAs[String](_, "name"))
+ val catalogs = getField[scala.collection.Map[String, _]](catalogMgr, "catalogs")
+ (catalogs.keys ++: defaults).distinct.map(Row(_))
+ }
+
+ def getCatalog(spark: SparkSession, catalogName: String): CatalogPlugin = {
+ val catalogManager = spark.sessionState.catalogManager
+ if (StringUtils.isBlank(catalogName)) {
+ catalogManager.currentCatalog
+ } else {
+ catalogManager.catalog(catalogName)
+ }
+ }
+
+ def setCurrentCatalog(spark: SparkSession, catalog: String): Unit = {
+ // SPARK-36841(3.3.0) Ensure setCurrentCatalog method catalog must exist
+ if (spark.sessionState.catalogManager.isCatalogRegistered(catalog)) {
+ spark.sessionState.catalogManager.setCurrentCatalog(catalog)
+ } else {
+ throw new IllegalArgumentException(s"Cannot find catalog plugin class for catalog '$catalog'")
+ }
+ }
+
+ // ///////////////////////////////////////////////////////////////////////////////////////////////
+ // Schema //
+ // ///////////////////////////////////////////////////////////////////////////////////////////////
+
+ /**
+ * a list of [[Row]]s, with 2 fields `schemaName: String, catalogName: String`
+ */
+ def getSchemas(
+ spark: SparkSession,
+ catalogName: String,
+ schemaPattern: String): Seq[Row] = {
+ if (catalogName == SparkCatalogUtils.SESSION_CATALOG) {
+ (spark.sessionState.catalog.listDatabases(schemaPattern) ++
+ getGlobalTempViewManager(spark, schemaPattern))
+ .map(Row(_, SparkCatalogUtils.SESSION_CATALOG))
+ } else {
+ val catalog = getCatalog(spark, catalogName)
+ getSchemasWithPattern(catalog, schemaPattern).map(Row(_, catalog.name))
+ }
+ }
+
+ private def getGlobalTempViewManager(
+ spark: SparkSession,
+ schemaPattern: String): Seq[String] = {
+ val database = spark.sharedState.globalTempViewManager.database
+ Option(database).filter(_.matches(schemaPattern)).toSeq
+ }
+
+ private def listAllNamespaces(
+ catalog: SupportsNamespaces,
+ namespaces: Array[Array[String]]): Array[Array[String]] = {
+ val children = namespaces.flatMap { ns =>
+ catalog.listNamespaces(ns)
+ }
+ if (children.isEmpty) {
+ namespaces
+ } else {
+ namespaces ++: listAllNamespaces(catalog, children)
+ }
+ }
+
+ private def listAllNamespaces(catalog: CatalogPlugin): Array[Array[String]] = {
+ catalog match {
+ case catalog: CatalogExtension =>
+ // DSv2 does not support pass schemaPattern transparently
+ catalog.defaultNamespace() +: catalog.listNamespaces(Array())
+ case catalog: SupportsNamespaces =>
+ val rootSchema = catalog.listNamespaces()
+ val allSchemas = listAllNamespaces(catalog, rootSchema)
+ allSchemas
+ }
+ }
+
+ private def listNamespacesWithPattern(
+ catalog: CatalogPlugin,
+ schemaPattern: String): Array[Array[String]] = {
+ listAllNamespaces(catalog).filter { ns =>
+ val quoted = ns.map(quoteIfNeeded).mkString(".")
+ schemaPattern.r.pattern.matcher(quoted).matches()
+ }.map(_.toList).toList.distinct.map(_.toArray).toArray
+ }
+
+ private def getSchemasWithPattern(catalog: CatalogPlugin, schemaPattern: String): Seq[String] = {
+ val p = schemaPattern.r.pattern
+ listAllNamespaces(catalog).flatMap { ns =>
+ val quoted = ns.map(quoteIfNeeded).mkString(".")
+ if (p.matcher(quoted).matches()) Some(quoted) else None
+ }.distinct
+ }
+
+ // ///////////////////////////////////////////////////////////////////////////////////////////////
+ // Table & View //
+ // ///////////////////////////////////////////////////////////////////////////////////////////////
+
+ def getCatalogTablesOrViews(
+ spark: SparkSession,
+ catalogName: String,
+ schemaPattern: String,
+ tablePattern: String,
+ tableTypes: Set[String],
+ ignoreTableProperties: Boolean = false): Seq[Row] = {
+ val catalog = getCatalog(spark, catalogName)
+ val namespaces = listNamespacesWithPattern(catalog, schemaPattern)
+ catalog match {
+ case builtin if builtin.name() == SESSION_CATALOG =>
+ val catalog = spark.sessionState.catalog
+ val databases = catalog.listDatabases(schemaPattern)
+
+ def isMatchedTableType(tableTypes: Set[String], tableType: String): Boolean = {
+ val typ = if (tableType.equalsIgnoreCase(VIEW)) VIEW else TABLE
+ tableTypes.exists(typ.equalsIgnoreCase)
+ }
+
+ databases.flatMap { db =>
+ val identifiers = catalog.listTables(db, tablePattern, includeLocalTempViews = false)
+ catalog.getTablesByName(identifiers)
+ .filter(t => isMatchedTableType(tableTypes, t.tableType.name)).map { t =>
+ val typ = if (t.tableType.name == VIEW) VIEW else TABLE
+ Row(
+ catalogName,
+ t.database,
+ t.identifier.table,
+ typ,
+ t.comment.getOrElse(""),
+ null,
+ null,
+ null,
+ null,
+ null)
+ }
+ }
+ case tc: TableCatalog =>
+ val tp = tablePattern.r.pattern
+ val identifiers = namespaces.flatMap { ns =>
+ tc.listTables(ns).filter(i => tp.matcher(quoteIfNeeded(i.name())).matches())
+ }
+ identifiers.map { ident =>
+ // TODO: restore view type for session catalog
+ val comment = if (ignoreTableProperties) ""
+ else { // load table is a time consuming operation
+ tc.loadTable(ident).properties().getOrDefault(TableCatalog.PROP_COMMENT, "")
+ }
+ val schema = ident.namespace().map(quoteIfNeeded).mkString(".")
+ val tableName = quoteIfNeeded(ident.name())
+ Row(catalog.name(), schema, tableName, TABLE, comment, null, null, null, null, null)
+ }
+ case _ => Seq.empty[Row]
+ }
+ }
+
+ private def getColumnsByCatalog(
+ spark: SparkSession,
+ catalogName: String,
+ schemaPattern: String,
+ tablePattern: String,
+ columnPattern: Pattern): Seq[Row] = {
+ val catalog = getCatalog(spark, catalogName)
+
+ catalog match {
+ case tc: TableCatalog =>
+ val namespaces = listNamespacesWithPattern(catalog, schemaPattern)
+ val tp = tablePattern.r.pattern
+ val identifiers = namespaces.flatMap { ns =>
+ tc.listTables(ns).filter(i => tp.matcher(quoteIfNeeded(i.name())).matches())
+ }
+ identifiers.flatMap { ident =>
+ val table = tc.loadTable(ident)
+ val namespace = ident.namespace().map(quoteIfNeeded).mkString(".")
+ val tableName = quoteIfNeeded(ident.name())
+
+ table.schema.zipWithIndex.filter(f => columnPattern.matcher(f._1.name).matches())
+ .map { case (f, i) => toColumnResult(tc.name(), namespace, tableName, f, i) }
+ }
+
+ case builtin if builtin.name() == SESSION_CATALOG =>
+ val catalog = spark.sessionState.catalog
+ val databases = catalog.listDatabases(schemaPattern)
+ databases.flatMap { db =>
+ val identifiers = catalog.listTables(db, tablePattern, includeLocalTempViews = true)
+ catalog.getTablesByName(identifiers).flatMap { t =>
+ t.schema.zipWithIndex.filter(f => columnPattern.matcher(f._1.name).matches())
+ .map { case (f, i) =>
+ toColumnResult(catalogName, t.database, t.identifier.table, f, i)
+ }
+ }
+ }
+ }
+ }
+
+ def getTempViews(
+ spark: SparkSession,
+ catalogName: String,
+ schemaPattern: String,
+ tablePattern: String): Seq[Row] = {
+ val views = getViews(spark, schemaPattern, tablePattern)
+ views.map { ident =>
+ Row(catalogName, ident.database.orNull, ident.table, VIEW, "", null, null, null, null, null)
+ }
+ }
+
+ private def getViews(
+ spark: SparkSession,
+ schemaPattern: String,
+ tablePattern: String): Seq[TableIdentifier] = {
+ val db = getGlobalTempViewManager(spark, schemaPattern)
+ if (db.nonEmpty) {
+ spark.sessionState.catalog.listTables(db.head, tablePattern)
+ } else {
+ spark.sessionState.catalog.listLocalTempViews(tablePattern)
+ }
+ }
+
+ // ///////////////////////////////////////////////////////////////////////////////////////////////
+ // Columns //
+ // ///////////////////////////////////////////////////////////////////////////////////////////////
+
+ def getColumns(
+ spark: SparkSession,
+ catalogName: String,
+ schemaPattern: String,
+ tablePattern: String,
+ columnPattern: String): Seq[Row] = {
+
+ val cp = columnPattern.r.pattern
+ val byCatalog = getColumnsByCatalog(spark, catalogName, schemaPattern, tablePattern, cp)
+ val byGlobalTmpDB = getColumnsByGlobalTempViewManager(spark, schemaPattern, tablePattern, cp)
+ val byLocalTmp = getColumnsByLocalTempViews(spark, tablePattern, cp)
+
+ byCatalog ++ byGlobalTmpDB ++ byLocalTmp
+ }
+
+ private def getColumnsByGlobalTempViewManager(
+ spark: SparkSession,
+ schemaPattern: String,
+ tablePattern: String,
+ columnPattern: Pattern): Seq[Row] = {
+ val catalog = spark.sessionState.catalog
+
+ getGlobalTempViewManager(spark, schemaPattern).flatMap { globalTmpDb =>
+ catalog.globalTempViewManager.listViewNames(tablePattern).flatMap { v =>
+ catalog.globalTempViewManager.get(v).map { plan =>
+ plan.schema.zipWithIndex.filter(f => columnPattern.matcher(f._1.name).matches())
+ .map { case (f, i) =>
+ toColumnResult(SparkCatalogUtils.SESSION_CATALOG, globalTmpDb, v, f, i)
+ }
+ }
+ }.flatten
+ }
+ }
+
+ private def getColumnsByLocalTempViews(
+ spark: SparkSession,
+ tablePattern: String,
+ columnPattern: Pattern): Seq[Row] = {
+ val catalog = spark.sessionState.catalog
+
+ catalog.listLocalTempViews(tablePattern)
+ .map(v => (v, catalog.getTempView(v.table).get))
+ .flatMap { case (v, plan) =>
+ plan.schema.zipWithIndex
+ .filter(f => columnPattern.matcher(f._1.name).matches())
+ .map { case (f, i) =>
+ toColumnResult(SparkCatalogUtils.SESSION_CATALOG, null, v.table, f, i)
+ }
+ }
+ }
+
+ private def toColumnResult(
+ catalog: String,
+ db: String,
+ table: String,
+ col: StructField,
+ pos: Int): Row = {
+ // format: off
+ Row(
+ catalog, // TABLE_CAT
+ db, // TABLE_SCHEM
+ table, // TABLE_NAME
+ col.name, // COLUMN_NAME
+ SchemaHelper.toJavaSQLType(col.dataType), // DATA_TYPE
+ col.dataType.sql, // TYPE_NAME
+ SchemaHelper.getColumnSize(col.dataType).orNull, // COLUMN_SIZE
+ null, // BUFFER_LENGTH
+ SchemaHelper.getDecimalDigits(col.dataType).orNull, // DECIMAL_DIGITS
+ SchemaHelper.getNumPrecRadix(col.dataType).orNull, // NUM_PREC_RADIX
+ if (col.nullable) 1 else 0, // NULLABLE
+ col.getComment().getOrElse(""), // REMARKS
+ null, // COLUMN_DEF
+ null, // SQL_DATA_TYPE
+ null, // SQL_DATETIME_SUB
+ null, // CHAR_OCTET_LENGTH
+ pos, // ORDINAL_POSITION
+ "YES", // IS_NULLABLE
+ null, // SCOPE_CATALOG
+ null, // SCOPE_SCHEMA
+ null, // SCOPE_TABLE
+ null, // SOURCE_DATA_TYPE
+ "NO" // IS_AUTO_INCREMENT
+ )
+ // format: on
+ }
+
+ /**
+ * Forked from Apache Spark's [[org.apache.spark.sql.catalyst.util.quoteIfNeeded]]
+ */
+ def quoteIfNeeded(part: String): String = {
+ if (part.matches("[a-zA-Z0-9_]+") && !part.matches("\\d+")) {
+ part
+ } else {
+ s"`${part.replace("`", "``")}`"
+ }
+ }
+}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SQLOperationListener.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SQLOperationListener.scala
index 1a57fcf2994..4e4a940d295 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SQLOperationListener.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SQLOperationListener.scala
@@ -45,7 +45,7 @@ class SQLOperationListener(
private val operationId: String = operation.getHandle.identifier.toString
private lazy val activeJobs = new java.util.HashSet[Int]()
- private lazy val activeStages = new ConcurrentHashMap[StageAttempt, StageInfo]()
+ private lazy val activeStages = new ConcurrentHashMap[SparkStageAttempt, SparkStageInfo]()
private var executionId: Option[Long] = None
private val conf: KyuubiConf = operation.getSession.sessionManager.getConf
@@ -120,10 +120,10 @@ class SQLOperationListener(
val stageInfo = stageSubmitted.stageInfo
val stageId = stageInfo.stageId
val attemptNumber = stageInfo.attemptNumber()
- val stageAttempt = StageAttempt(stageId, attemptNumber)
+ val stageAttempt = SparkStageAttempt(stageId, attemptNumber)
activeStages.put(
stageAttempt,
- new StageInfo(stageId, stageInfo.numTasks))
+ new SparkStageInfo(stageId, stageInfo.numTasks))
withOperationLog {
info(s"Query [$operationId]: Stage $stageId.$attemptNumber started " +
s"with ${stageInfo.numTasks} tasks, ${activeStages.size()} active stages running")
@@ -134,7 +134,7 @@ class SQLOperationListener(
override def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit = {
val stageInfo = stageCompleted.stageInfo
- val stageAttempt = StageAttempt(stageInfo.stageId, stageInfo.attemptNumber())
+ val stageAttempt = SparkStageAttempt(stageInfo.stageId, stageInfo.attemptNumber())
activeStages.synchronized {
if (activeStages.remove(stageAttempt) != null) {
withOperationLog(super.onStageCompleted(stageCompleted))
@@ -143,19 +143,19 @@ class SQLOperationListener(
}
override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = activeStages.synchronized {
- val stageAttempt = StageAttempt(taskStart.stageId, taskStart.stageAttemptId)
+ val stageAttempt = SparkStageAttempt(taskStart.stageId, taskStart.stageAttemptId)
if (activeStages.containsKey(stageAttempt)) {
- activeStages.get(stageAttempt).numActiveTasks += 1
+ activeStages.get(stageAttempt).numActiveTasks.getAndIncrement()
super.onTaskStart(taskStart)
}
}
override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = activeStages.synchronized {
- val stageAttempt = StageAttempt(taskEnd.stageId, taskEnd.stageAttemptId)
+ val stageAttempt = SparkStageAttempt(taskEnd.stageId, taskEnd.stageAttemptId)
if (activeStages.containsKey(stageAttempt)) {
- activeStages.get(stageAttempt).numActiveTasks -= 1
+ activeStages.get(stageAttempt).numActiveTasks.getAndDecrement()
if (taskEnd.reason == org.apache.spark.Success) {
- activeStages.get(stageAttempt).numCompleteTasks += 1
+ activeStages.get(stageAttempt).numCompleteTasks.getAndIncrement()
}
super.onTaskEnd(taskEnd)
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkConsoleProgressBar.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkConsoleProgressBar.scala
index fc2ebd5f8c8..dc8b493cc04 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkConsoleProgressBar.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkConsoleProgressBar.scala
@@ -29,7 +29,7 @@ import org.apache.kyuubi.operation.Operation
class SparkConsoleProgressBar(
operation: Operation,
- liveStages: ConcurrentHashMap[StageAttempt, StageInfo],
+ liveStages: ConcurrentHashMap[SparkStageAttempt, SparkStageInfo],
updatePeriodMSec: Long,
timeFormat: String)
extends Logging {
@@ -77,7 +77,7 @@ class SparkConsoleProgressBar(
* after your last output, keeps overwriting itself to hold in one line. The logging will follow
* the progress bar, then progress bar will be showed in next line without overwrite logs.
*/
- private def show(now: Long, stages: Seq[StageInfo]): Unit = {
+ private def show(now: Long, stages: Seq[SparkStageInfo]): Unit = {
val width = TerminalWidth / stages.size
val bar = stages.map { s =>
val total = s.numTasks
@@ -86,7 +86,7 @@ class SparkConsoleProgressBar(
val w = width - header.length - tailer.length
val bar =
if (w > 0) {
- val percent = w * s.numCompleteTasks / total
+ val percent = w * s.numCompleteTasks.get / total
(0 until w).map { i =>
if (i < percent) "=" else if (i == percent) ">" else " "
}.mkString("")
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkProgressMonitor.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkProgressMonitor.scala
index a46cbecc22e..1d9ef53eae9 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkProgressMonitor.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkProgressMonitor.scala
@@ -136,12 +136,8 @@ class SparkProgressMonitor(spark: SparkSession, jobGroup: String) {
trimmedVName = s.substring(0, COLUMN_1_WIDTH - 2)
trimmedVName += ".."
} else trimmedVName += " "
- val result = new StringBuilder(trimmedVName)
val toFill = (spaceRemaining * percent).toInt
- for (i <- 0 until toFill) {
- result.append(".")
- }
- result.toString
+ s"$trimmedVName${"." * toFill}"
}
private def getCompletedStages: Int = {
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkSQLEngineListener.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkSQLEngineListener.scala
index 8e32b53291a..48f157a43d6 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkSQLEngineListener.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkSQLEngineListener.scala
@@ -40,9 +40,9 @@ import org.apache.kyuubi.service.{Serverable, ServiceState}
class SparkSQLEngineListener(server: Serverable) extends SparkListener with Logging {
// the conf of server is null before initialized, use lazy val here
- private lazy val deregisterExceptions: Seq[String] =
+ private lazy val deregisterExceptions: Set[String] =
server.getConf.get(ENGINE_DEREGISTER_EXCEPTION_CLASSES)
- private lazy val deregisterMessages: Seq[String] =
+ private lazy val deregisterMessages: Set[String] =
server.getConf.get(ENGINE_DEREGISTER_EXCEPTION_MESSAGES)
private lazy val deregisterExceptionTTL: Long =
server.getConf.get(ENGINE_DEREGISTER_EXCEPTION_TTL)
@@ -74,7 +74,7 @@ class SparkSQLEngineListener(server: Serverable) extends SparkListener with Logg
case JobFailed(e) if e != null =>
val cause = findCause(e)
var deregisterInfo: Option[String] = None
- if (deregisterExceptions.exists(_.equals(cause.getClass.getCanonicalName))) {
+ if (deregisterExceptions.contains(cause.getClass.getCanonicalName)) {
deregisterInfo = Some("Job failed exception class is in the set of " +
s"${ENGINE_DEREGISTER_EXCEPTION_CLASSES.key}, deregistering the engine.")
} else if (deregisterMessages.exists(stringifyException(cause).contains)) {
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkUtilsHelper.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkUtilsHelper.scala
index e2f51e648c0..106be3fc789 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkUtilsHelper.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/SparkUtilsHelper.scala
@@ -43,4 +43,13 @@ object SparkUtilsHelper extends Logging {
def getLocalDir(conf: SparkConf): String = {
Utils.getLocalDir(conf)
}
+
+ def classesArePresent(className: String): Boolean = {
+ try {
+ Utils.classForName(className)
+ true
+ } catch {
+ case _: ClassNotFoundException | _: NoClassDefFoundError => false
+ }
+ }
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/StageStatus.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/StageStatus.scala
index 14457086254..2ea9c3fdae6 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/StageStatus.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/kyuubi/StageStatus.scala
@@ -17,11 +17,13 @@
package org.apache.spark.kyuubi
-case class StageAttempt(stageId: Int, stageAttemptId: Int) {
+import java.util.concurrent.atomic.AtomicInteger
+
+case class SparkStageAttempt(stageId: Int, stageAttemptId: Int) {
override def toString: String = s"Stage $stageId (Attempt $stageAttemptId)"
}
-class StageInfo(val stageId: Int, val numTasks: Int) {
- var numActiveTasks = 0
- var numCompleteTasks = 0
+class SparkStageInfo(val stageId: Int, val numTasks: Int) {
+ var numActiveTasks = new AtomicInteger(0)
+ var numCompleteTasks = new AtomicInteger(0)
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/execution/arrow/KyuubiArrowConverters.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/execution/arrow/KyuubiArrowConverters.scala
new file mode 100644
index 00000000000..5c4d7086ff3
--- /dev/null
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/execution/arrow/KyuubiArrowConverters.scala
@@ -0,0 +1,352 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.arrow
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
+import java.lang.{Boolean => JBoolean}
+import java.nio.channels.Channels
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.ipc.{ArrowStreamWriter, ReadChannel, WriteChannel}
+import org.apache.arrow.vector.ipc.message.{IpcOption, MessageSerializer}
+import org.apache.arrow.vector.types.pojo.{Schema => ArrowSchema}
+import org.apache.spark.TaskContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.{InternalRow, SQLConfHelper}
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow
+import org.apache.spark.sql.execution.CollectLimitExec
+import org.apache.spark.sql.types._
+import org.apache.spark.sql.util.ArrowUtils
+import org.apache.spark.util.Utils
+
+import org.apache.kyuubi.util.reflect.DynMethods
+
+object KyuubiArrowConverters extends SQLConfHelper with Logging {
+
+ type Batch = (Array[Byte], Long)
+
+ /**
+ * this method is to slice the input Arrow record batch byte array `bytes`, starting from `start`
+ * and taking `length` number of elements.
+ */
+ def slice(
+ schema: StructType,
+ timeZoneId: String,
+ bytes: Array[Byte],
+ start: Int,
+ length: Int): Array[Byte] = {
+ val in = new ByteArrayInputStream(bytes)
+ val out = new ByteArrayOutputStream(bytes.length)
+
+ var vectorSchemaRoot: VectorSchemaRoot = null
+ var slicedVectorSchemaRoot: VectorSchemaRoot = null
+
+ val sliceAllocator = ArrowUtils.rootAllocator.newChildAllocator(
+ "slice",
+ 0,
+ Long.MaxValue)
+ val arrowSchema = toArrowSchema(schema, timeZoneId, true, false)
+ vectorSchemaRoot = VectorSchemaRoot.create(arrowSchema, sliceAllocator)
+ try {
+ val recordBatch = MessageSerializer.deserializeRecordBatch(
+ new ReadChannel(Channels.newChannel(in)),
+ sliceAllocator)
+ val vectorLoader = new VectorLoader(vectorSchemaRoot)
+ vectorLoader.load(recordBatch)
+ recordBatch.close()
+ slicedVectorSchemaRoot = vectorSchemaRoot.slice(start, length)
+
+ val unloader = new VectorUnloader(slicedVectorSchemaRoot)
+ val writeChannel = new WriteChannel(Channels.newChannel(out))
+ val batch = unloader.getRecordBatch()
+ MessageSerializer.serialize(writeChannel, batch)
+ batch.close()
+ out.toByteArray()
+ } finally {
+ in.close()
+ out.close()
+ if (vectorSchemaRoot != null) {
+ vectorSchemaRoot.getFieldVectors.asScala.foreach(_.close())
+ vectorSchemaRoot.close()
+ }
+ if (slicedVectorSchemaRoot != null) {
+ slicedVectorSchemaRoot.getFieldVectors.asScala.foreach(_.close())
+ slicedVectorSchemaRoot.close()
+ }
+ sliceAllocator.close()
+ }
+ }
+
+ /**
+ * Forked from `org.apache.spark.sql.execution.SparkPlan#executeTake()`, the algorithm can be
+ * summarized in the following steps:
+ * 1. If the limit specified in the CollectLimitExec object is 0, the function returns an empty
+ * array of batches.
+ * 2. Otherwise, execute the child query plan of the CollectLimitExec object to obtain an RDD of
+ * data to collect.
+ * 3. Use an iterative approach to collect data in batches until the specified limit is reached.
+ * In each iteration, it selects a subset of the partitions of the RDD to scan and tries to
+ * collect data from them.
+ * 4. For each partition subset, we use the runJob method of the Spark context to execute a
+ * closure that scans the partition data and converts it to Arrow batches.
+ * 5. Check if the collected data reaches the specified limit. If not, it selects another subset
+ * of partitions to scan and repeats the process until the limit is reached or all partitions
+ * have been scanned.
+ * 6. Return an array of all the collected Arrow batches.
+ *
+ * Note that:
+ * 1. The returned Arrow batches row count >= limit, if the input df has more than the `limit`
+ * row count
+ * 2. We don't implement the `takeFromEnd` logical
+ *
+ * @return
+ */
+ def takeAsArrowBatches(
+ collectLimitExec: CollectLimitExec,
+ maxRecordsPerBatch: Long,
+ maxEstimatedBatchSize: Long,
+ timeZoneId: String): Array[Batch] = {
+ val n = collectLimitExec.limit
+ val schema = collectLimitExec.schema
+ if (n == 0) {
+ new Array[Batch](0)
+ } else {
+ val limitScaleUpFactor = Math.max(conf.limitScaleUpFactor, 2)
+ // TODO: refactor and reuse the code from RDD's take()
+ val childRDD = collectLimitExec.child.execute()
+ val buf = new ArrayBuffer[Batch]
+ var bufferedRowSize = 0L
+ val totalParts = childRDD.partitions.length
+ var partsScanned = 0
+ while (bufferedRowSize < n && partsScanned < totalParts) {
+ // The number of partitions to try in this iteration. It is ok for this number to be
+ // greater than totalParts because we actually cap it at totalParts in runJob.
+ var numPartsToTry = limitInitialNumPartitions
+ if (partsScanned > 0) {
+ // If we didn't find any rows after the previous iteration, multiply by
+ // limitScaleUpFactor and retry. Otherwise, interpolate the number of partitions we need
+ // to try, but overestimate it by 50%. We also cap the estimation in the end.
+ if (buf.isEmpty) {
+ numPartsToTry = partsScanned * limitScaleUpFactor
+ } else {
+ val left = n - bufferedRowSize
+ // As left > 0, numPartsToTry is always >= 1
+ numPartsToTry = Math.ceil(1.5 * left * partsScanned / bufferedRowSize).toInt
+ numPartsToTry = Math.min(numPartsToTry, partsScanned * limitScaleUpFactor)
+ }
+ }
+
+ val partsToScan =
+ partsScanned.until(math.min(partsScanned + numPartsToTry, totalParts))
+
+ // TODO: SparkPlan.session introduced in SPARK-35798, replace with SparkPlan.session once we
+ // drop Spark-3.1.x support.
+ val sc = SparkSession.active.sparkContext
+ val res = sc.runJob(
+ childRDD,
+ (it: Iterator[InternalRow]) => {
+ val batches = toBatchIterator(
+ it,
+ schema,
+ maxRecordsPerBatch,
+ maxEstimatedBatchSize,
+ n,
+ timeZoneId)
+ batches.map(b => b -> batches.rowCountInLastBatch).toArray
+ },
+ partsToScan)
+
+ var i = 0
+ while (bufferedRowSize < n && i < res.length) {
+ var j = 0
+ val batches = res(i)
+ while (j < batches.length && n > bufferedRowSize) {
+ val batch = batches(j)
+ val (_, batchSize) = batch
+ buf += batch
+ bufferedRowSize += batchSize
+ j += 1
+ }
+ i += 1
+ }
+ partsScanned += partsToScan.size
+ }
+
+ buf.toArray
+ }
+ }
+
+ /**
+ * Spark introduced the config `spark.sql.limit.initialNumPartitions` since 3.4.0. see SPARK-40211
+ */
+ private def limitInitialNumPartitions: Int = {
+ conf.getConfString("spark.sql.limit.initialNumPartitions", "1")
+ .toInt
+ }
+
+ /**
+ * Different from [[org.apache.spark.sql.execution.arrow.ArrowConverters.toBatchIterator]],
+ * each output arrow batch contains this batch row count.
+ */
+ def toBatchIterator(
+ rowIter: Iterator[InternalRow],
+ schema: StructType,
+ maxRecordsPerBatch: Long,
+ maxEstimatedBatchSize: Long,
+ limit: Long,
+ timeZoneId: String): ArrowBatchIterator = {
+ new ArrowBatchIterator(
+ rowIter,
+ schema,
+ maxRecordsPerBatch,
+ maxEstimatedBatchSize,
+ limit,
+ timeZoneId,
+ TaskContext.get)
+ }
+
+ /**
+ * This class ArrowBatchIterator is derived from
+ * [[org.apache.spark.sql.execution.arrow.ArrowConverters.ArrowBatchWithSchemaIterator]],
+ * with two key differences:
+ * 1. there is no requirement to write the schema at the batch header
+ * 2. iteration halts when `rowCount` equals `limit`
+ * Note that `limit < 0` means no limit, and return all rows the in the iterator.
+ */
+ private[sql] class ArrowBatchIterator(
+ rowIter: Iterator[InternalRow],
+ schema: StructType,
+ maxRecordsPerBatch: Long,
+ maxEstimatedBatchSize: Long,
+ limit: Long,
+ timeZoneId: String,
+ context: TaskContext)
+ extends Iterator[Array[Byte]] {
+
+ protected val arrowSchema = toArrowSchema(schema, timeZoneId, true, false)
+ private val allocator =
+ ArrowUtils.rootAllocator.newChildAllocator(
+ s"to${this.getClass.getSimpleName}",
+ 0,
+ Long.MaxValue)
+
+ private val root = VectorSchemaRoot.create(arrowSchema, allocator)
+ protected val unloader = new VectorUnloader(root)
+ protected val arrowWriter = ArrowWriter.create(root)
+
+ Option(context).foreach {
+ _.addTaskCompletionListener[Unit] { _ =>
+ root.close()
+ allocator.close()
+ }
+ }
+
+ override def hasNext: Boolean = (rowIter.hasNext && (rowCount < limit || limit < 0)) || {
+ root.close()
+ allocator.close()
+ false
+ }
+
+ var rowCountInLastBatch: Long = 0
+ var rowCount: Long = 0
+
+ override def next(): Array[Byte] = {
+ val out = new ByteArrayOutputStream()
+ val writeChannel = new WriteChannel(Channels.newChannel(out))
+
+ rowCountInLastBatch = 0
+ var estimatedBatchSize = 0L
+ Utils.tryWithSafeFinally {
+
+ // Always write the first row.
+ while (rowIter.hasNext && (
+ // For maxBatchSize and maxRecordsPerBatch, respect whatever smaller.
+ // If the size in bytes is positive (set properly), always write the first row.
+ rowCountInLastBatch == 0 && maxEstimatedBatchSize > 0 ||
+ // If the size in bytes of rows are 0 or negative, unlimit it.
+ estimatedBatchSize <= 0 ||
+ estimatedBatchSize < maxEstimatedBatchSize ||
+ // If the size of rows are 0 or negative, unlimit it.
+ maxRecordsPerBatch <= 0 ||
+ rowCountInLastBatch < maxRecordsPerBatch ||
+ rowCount < limit ||
+ limit < 0)) {
+ val row = rowIter.next()
+ arrowWriter.write(row)
+ estimatedBatchSize += (row match {
+ case ur: UnsafeRow => ur.getSizeInBytes
+ // Trying to estimate the size of the current row
+ case _: InternalRow => schema.defaultSize
+ })
+ rowCountInLastBatch += 1
+ rowCount += 1
+ }
+ arrowWriter.finish()
+ val batch = unloader.getRecordBatch()
+ MessageSerializer.serialize(writeChannel, batch)
+
+ // Always write the Ipc options at the end.
+ ArrowStreamWriter.writeEndOfStream(writeChannel, ARROW_IPC_OPTION_DEFAULT)
+
+ batch.close()
+ } {
+ arrowWriter.reset()
+ }
+
+ out.toByteArray
+ }
+ }
+
+ // the signature of function [[ArrowUtils.toArrowSchema]] is changed in SPARK-41971 (since Spark
+ // 3.5)
+ private lazy val toArrowSchemaMethod = DynMethods.builder("toArrowSchema")
+ .impl( // for Spark 3.4 or previous
+ "org.apache.spark.sql.util.ArrowUtils",
+ classOf[StructType],
+ classOf[String])
+ .impl( // for Spark 3.5 or later
+ "org.apache.spark.sql.util.ArrowUtils",
+ classOf[StructType],
+ classOf[String],
+ classOf[Boolean],
+ classOf[Boolean])
+ .build()
+
+ /**
+ * this function uses reflective calls to the [[ArrowUtils.toArrowSchema]].
+ */
+ private def toArrowSchema(
+ schema: StructType,
+ timeZone: String,
+ errorOnDuplicatedFieldNames: JBoolean,
+ largeVarTypes: JBoolean): ArrowSchema = {
+ toArrowSchemaMethod.invoke[ArrowSchema](
+ ArrowUtils,
+ schema,
+ timeZone,
+ errorOnDuplicatedFieldNames,
+ largeVarTypes)
+ }
+
+ // IpcOption.DEFAULT was introduced in ARROW-11081(ARROW-4.0.0), add this for adapt Spark-3.1/3.2
+ final private val ARROW_IPC_OPTION_DEFAULT = new IpcOption()
+}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala
index 1a542937338..c0f9d61c210 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala
@@ -17,18 +17,87 @@
package org.apache.spark.sql.kyuubi
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.network.util.{ByteUnit, JavaUtils}
import org.apache.spark.rdd.RDD
-import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.{CollectLimitExec, LocalTableScanExec, SparkPlan, SQLExecution}
+import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec
+import org.apache.spark.sql.execution.arrow.KyuubiArrowConverters
+import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
+import org.apache.kyuubi.engine.spark.KyuubiSparkUtil
import org.apache.kyuubi.engine.spark.schema.RowSet
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils.quoteIfNeeded
+import org.apache.kyuubi.util.reflect.DynMethods
+import org.apache.kyuubi.util.reflect.ReflectUtils._
+
+object SparkDatasetHelper extends Logging {
+
+ def executeCollect(df: DataFrame): Array[Array[Byte]] = withNewExecutionId(df) {
+ executeArrowBatchCollect(df.queryExecution.executedPlan)
+ }
+
+ def executeArrowBatchCollect: SparkPlan => Array[Array[Byte]] = {
+ case adaptiveSparkPlan: AdaptiveSparkPlanExec =>
+ executeArrowBatchCollect(finalPhysicalPlan(adaptiveSparkPlan))
+ // TODO: avoid extra shuffle if `offset` > 0
+ case collectLimit: CollectLimitExec if offset(collectLimit) > 0 =>
+ logWarning("unsupported offset > 0, an extra shuffle will be introduced.")
+ toArrowBatchRdd(collectLimit).collect()
+ case collectLimit: CollectLimitExec if collectLimit.limit >= 0 =>
+ doCollectLimit(collectLimit)
+ case collectLimit: CollectLimitExec if collectLimit.limit < 0 =>
+ executeArrowBatchCollect(collectLimit.child)
+ // TODO: replace with pattern match once we drop Spark 3.1 support.
+ case command: SparkPlan if isCommandResultExec(command) =>
+ doCommandResultExec(command)
+ case localTableScan: LocalTableScanExec =>
+ doLocalTableScan(localTableScan)
+ case plan: SparkPlan =>
+ toArrowBatchRdd(plan).collect()
+ }
-object SparkDatasetHelper {
def toArrowBatchRdd[T](ds: Dataset[T]): RDD[Array[Byte]] = {
ds.toArrowBatchRdd
}
+ /**
+ * Forked from [[Dataset.toArrowBatchRdd(plan: SparkPlan)]].
+ * Convert to an RDD of serialized ArrowRecordBatches.
+ */
+ def toArrowBatchRdd(plan: SparkPlan): RDD[Array[Byte]] = {
+ val schemaCaptured = plan.schema
+ // TODO: SparkPlan.session introduced in SPARK-35798, replace with SparkPlan.session once we
+ // drop Spark 3.1 support.
+ val maxRecordsPerBatch = SparkSession.active.sessionState.conf.arrowMaxRecordsPerBatch
+ val timeZoneId = SparkSession.active.sessionState.conf.sessionLocalTimeZone
+ // note that, we can't pass the lazy variable `maxBatchSize` directly, this is because input
+ // arguments are serialized and sent to the executor side for execution.
+ val maxBatchSizePerBatch = maxBatchSize
+ plan.execute().mapPartitionsInternal { iter =>
+ KyuubiArrowConverters.toBatchIterator(
+ iter,
+ schemaCaptured,
+ maxRecordsPerBatch,
+ maxBatchSizePerBatch,
+ -1,
+ timeZoneId)
+ }
+ }
+
+ def toArrowBatchLocalIterator(df: DataFrame): Iterator[Array[Byte]] = {
+ withNewExecutionId(df) {
+ toArrowBatchRdd(df).toLocalIterator
+ }
+ }
+
def convertTopLevelComplexTypeToHiveString(
df: DataFrame,
timestampAsString: Boolean): DataFrame = {
@@ -64,15 +133,149 @@ object SparkDatasetHelper {
df.select(cols: _*)
}
+ private lazy val maxBatchSize: Long = {
+ // respect spark connect config
+ KyuubiSparkUtil.globalSparkContext
+ .getConf
+ .getOption("spark.connect.grpc.arrow.maxBatchSize")
+ .orElse(Option("4m"))
+ .map(JavaUtils.byteStringAs(_, ByteUnit.MiB))
+ .get
+ }
+
+ private def doCollectLimit(collectLimit: CollectLimitExec): Array[Array[Byte]] = {
+ // TODO: SparkPlan.session introduced in SPARK-35798, replace with SparkPlan.session once we
+ // drop Spark-3.1.x support.
+ val timeZoneId = SparkSession.active.sessionState.conf.sessionLocalTimeZone
+ val maxRecordsPerBatch = SparkSession.active.sessionState.conf.arrowMaxRecordsPerBatch
+
+ val batches = KyuubiArrowConverters.takeAsArrowBatches(
+ collectLimit,
+ maxRecordsPerBatch,
+ maxBatchSize,
+ timeZoneId)
+
+ // note that the number of rows in the returned arrow batches may be >= `limit`, perform
+ // the slicing operation of result
+ val result = ArrayBuffer[Array[Byte]]()
+ var i = 0
+ var rest = collectLimit.limit
+ while (i < batches.length && rest > 0) {
+ val (batch, size) = batches(i)
+ if (size <= rest) {
+ result += batch
+ // returned ArrowRecordBatch has less than `limit` row count, safety to do conversion
+ rest -= size.toInt
+ } else { // size > rest
+ result += KyuubiArrowConverters.slice(collectLimit.schema, timeZoneId, batch, 0, rest)
+ rest = 0
+ }
+ i += 1
+ }
+ result.toArray
+ }
+
+ private lazy val commandResultExecRowsMethod = DynMethods.builder("rows")
+ .impl("org.apache.spark.sql.execution.CommandResultExec")
+ .build()
+
+ private def doCommandResultExec(command: SparkPlan): Array[Array[Byte]] = {
+ val spark = SparkSession.active
+ // TODO: replace with `command.rows` once we drop Spark 3.1 support.
+ val rows = commandResultExecRowsMethod.invoke[Seq[InternalRow]](command)
+ command.longMetric("numOutputRows").add(rows.size)
+ sendDriverMetrics(spark.sparkContext, command.metrics)
+ KyuubiArrowConverters.toBatchIterator(
+ rows.iterator,
+ command.schema,
+ spark.sessionState.conf.arrowMaxRecordsPerBatch,
+ maxBatchSize,
+ -1,
+ spark.sessionState.conf.sessionLocalTimeZone).toArray
+ }
+
+ private def doLocalTableScan(localTableScan: LocalTableScanExec): Array[Array[Byte]] = {
+ val spark = SparkSession.active
+ localTableScan.longMetric("numOutputRows").add(localTableScan.rows.size)
+ sendDriverMetrics(spark.sparkContext, localTableScan.metrics)
+ KyuubiArrowConverters.toBatchIterator(
+ localTableScan.rows.iterator,
+ localTableScan.schema,
+ spark.sessionState.conf.arrowMaxRecordsPerBatch,
+ maxBatchSize,
+ -1,
+ spark.sessionState.conf.sessionLocalTimeZone).toArray
+ }
+
/**
- * Fork from Apache Spark-3.3.1 org.apache.spark.sql.catalyst.util.quoteIfNeeded to adapt to
- * Spark-3.1.x
+ * This method provides a reflection-based implementation of
+ * [[AdaptiveSparkPlanExec.finalPhysicalPlan]] that enables us to adapt to the Spark runtime
+ * without patching SPARK-41914.
+ *
+ * TODO: Once we drop support for Spark 3.1.x, we can directly call
+ * [[AdaptiveSparkPlanExec.finalPhysicalPlan]].
*/
- def quoteIfNeeded(part: String): String = {
- if (part.matches("[a-zA-Z0-9_]+") && !part.matches("\\d+")) {
- part
- } else {
- s"`${part.replace("`", "``")}`"
+ def finalPhysicalPlan(adaptiveSparkPlanExec: AdaptiveSparkPlanExec): SparkPlan = {
+ withFinalPlanUpdate(adaptiveSparkPlanExec, identity)
+ }
+
+ /**
+ * A reflection-based implementation of [[AdaptiveSparkPlanExec.withFinalPlanUpdate]].
+ */
+ private def withFinalPlanUpdate[T](
+ adaptiveSparkPlanExec: AdaptiveSparkPlanExec,
+ fun: SparkPlan => T): T = {
+ val plan = invokeAs[SparkPlan](adaptiveSparkPlanExec, "getFinalPhysicalPlan")
+ val result = fun(plan)
+ invokeAs[Unit](adaptiveSparkPlanExec, "finalPlanUpdate")
+ result
+ }
+
+ /**
+ * offset support was add since Spark-3.4(set SPARK-28330), to ensure backward compatibility with
+ * earlier versions of Spark, this function uses reflective calls to the "offset".
+ */
+ private def offset(collectLimitExec: CollectLimitExec): Int = {
+ Option(
+ DynMethods.builder("offset")
+ .impl(collectLimitExec.getClass)
+ .orNoop()
+ .build()
+ .invoke[Int](collectLimitExec))
+ .getOrElse(0)
+ }
+
+ private def isCommandResultExec(sparkPlan: SparkPlan): Boolean = {
+ // scalastyle:off line.size.limit
+ // the CommandResultExec was introduced in SPARK-35378 (Spark 3.2), after SPARK-35378 the
+ // physical plan of runnable command is CommandResultExec.
+ // for instance:
+ // ```
+ // scala> spark.sql("show tables").queryExecution.executedPlan
+ // res0: org.apache.spark.sql.execution.SparkPlan =
+ // CommandResult , [namespace#0, tableName#1, isTemporary#2]
+ // +- ShowTables [namespace#0, tableName#1, isTemporary#2], V2SessionCatalog(spark_catalog), [default]
+ //
+ // scala > spark.sql("show tables").queryExecution.executedPlan.getClass
+ // res1: Class[_ <: org.apache.spark.sql.execution.SparkPlan] = class org.apache.spark.sql.execution.CommandResultExec
+ // ```
+ // scalastyle:on line.size.limit
+ sparkPlan.getClass.getName == "org.apache.spark.sql.execution.CommandResultExec"
+ }
+
+ /**
+ * refer to org.apache.spark.sql.Dataset#withAction(), assign a new execution id for arrow-based
+ * operation, so that we can track the arrow-based queries on the UI tab.
+ */
+ private def withNewExecutionId[T](df: DataFrame)(body: => T): T = {
+ SQLExecution.withNewExecutionId(df.queryExecution, Some("collectAsArrow")) {
+ df.queryExecution.executedPlan.resetMetrics()
+ body
}
}
+
+ private def sendDriverMetrics(sc: SparkContext, metrics: Map[String, SQLMetric]): Unit = {
+ val executionId = sc.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
+ SQLMetrics.postDriverMetricUpdates(sc, executionId, metrics.values.toSeq)
+ }
}
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EnginePage.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EnginePage.scala
index a2a2931f411..7188ac62f62 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EnginePage.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EnginePage.scala
@@ -29,7 +29,7 @@ import org.apache.commons.text.StringEscapeUtils
import org.apache.spark.ui.TableSourceUtil._
import org.apache.spark.ui.UIUtils._
-import org.apache.kyuubi.{KYUUBI_VERSION, Utils}
+import org.apache.kyuubi._
import org.apache.kyuubi.engine.spark.events.{SessionEvent, SparkOperationEvent}
case class EnginePage(parent: EngineTab) extends WebUIPage("") {
@@ -58,6 +58,15 @@ case class EnginePage(parent: EngineTab) extends WebUIPage("") {
Kyuubi Version:
{KYUUBI_VERSION}
+
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EngineSessionPage.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EngineSessionPage.scala
index 1f34ae64f12..cdfc6d31355 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EngineSessionPage.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EngineSessionPage.scala
@@ -42,7 +42,7 @@ case class EngineSessionPage(parent: EngineTab)
require(parameterId != null && parameterId.nonEmpty, "Missing id parameter")
val content = store.synchronized { // make sure all parts in this page are consistent
- val sessionStat = store.getSession(parameterId).getOrElse(null)
+ val sessionStat = store.getSession(parameterId).orNull
require(sessionStat != null, "Invalid sessionID[" + parameterId + "]")
val redactionPattern = parent.sparkUI match {
@@ -51,7 +51,7 @@ case class EngineSessionPage(parent: EngineTab)
}
val sessionPropertiesTable =
- if (sessionStat.conf != null && !sessionStat.conf.isEmpty) {
+ if (sessionStat.conf != null && sessionStat.conf.nonEmpty) {
val table = UIUtils.listingTable(
propertyHeader,
propertyRow,
@@ -78,8 +78,18 @@ case class EngineSessionPage(parent: EngineTab)
User {sessionStat.username},
IP {sessionStat.ip},
- Server {sessionStat.serverIp},
+ Server {sessionStat.serverIp}
+
++
+
Session created at {formatDate(sessionStat.startTime)},
+ {
+ if (sessionStat.endTime > 0) {
+ s"""
+ | ended at ${formatDate(sessionStat.endTime)},
+ | after ${formatDuration(sessionStat.duration)}.
+ |""".stripMargin
+ }
+ }
Total run {sessionStat.totalOperations} SQL
++
sessionPropertiesTable ++
diff --git a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EngineTab.scala b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EngineTab.scala
index b7cebbd97eb..52edcf2200a 100644
--- a/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EngineTab.scala
+++ b/externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/ui/EngineTab.scala
@@ -26,7 +26,7 @@ import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.engine.spark.SparkSQLEngine
import org.apache.kyuubi.engine.spark.events.EngineEventsStore
import org.apache.kyuubi.service.ServiceState
-import org.apache.kyuubi.util.ClassUtils
+import org.apache.kyuubi.util.reflect.{DynClasses, DynMethods}
/**
* Note that [[SparkUITab]] is private for Spark
@@ -62,31 +62,35 @@ case class EngineTab(
sparkUI.foreach { ui =>
try {
- // Spark shade the jetty package so here we use reflection
- val sparkServletContextHandlerClz = loadSparkServletContextHandler
- val attachHandlerMethod = Class.forName("org.apache.spark.ui.SparkUI")
- .getMethod("attachHandler", sparkServletContextHandlerClz)
- val createRedirectHandlerMethod = Class.forName("org.apache.spark.ui.JettyUtils")
- .getMethod(
- "createRedirectHandler",
+ // [KYUUBI #3627]: the official spark release uses the shaded and relocated jetty classes,
+ // but if we use sbt to build for testing, e.g. docker image, it still uses the vanilla
+ // jetty classes.
+ val sparkServletContextHandlerClz = DynClasses.builder()
+ .impl("org.sparkproject.jetty.servlet.ServletContextHandler")
+ .impl("org.eclipse.jetty.servlet.ServletContextHandler")
+ .buildChecked()
+ val attachHandlerMethod = DynMethods.builder("attachHandler")
+ .impl("org.apache.spark.ui.SparkUI", sparkServletContextHandlerClz)
+ .buildChecked(ui)
+ val createRedirectHandlerMethod = DynMethods.builder("createRedirectHandler")
+ .impl(
+ "org.apache.spark.ui.JettyUtils",
classOf[String],
classOf[String],
- classOf[(HttpServletRequest) => Unit],
+ classOf[HttpServletRequest => Unit],
classOf[String],
classOf[Set[String]])
+ .buildStaticChecked()
attachHandlerMethod
.invoke(
- ui,
createRedirectHandlerMethod
- .invoke(null, "/kyuubi/stop", "/kyuubi", handleKillRequest _, "", Set("GET", "POST")))
+ .invoke("/kyuubi/stop", "/kyuubi", handleKillRequest _, "", Set("GET", "POST")))
attachHandlerMethod
.invoke(
- ui,
createRedirectHandlerMethod
.invoke(
- null,
"/kyuubi/gracefulstop",
"/kyuubi",
handleGracefulKillRequest _,
@@ -105,18 +109,6 @@ case class EngineTab(
cause)
}
- private def loadSparkServletContextHandler: Class[_] = {
- // [KYUUBI #3627]: the official spark release uses the shaded and relocated jetty classes,
- // but if use sbt to build for testing, e.g. docker image, it still uses vanilla jetty classes.
- val shaded = "org.sparkproject.jetty.servlet.ServletContextHandler"
- val vanilla = "org.eclipse.jetty.servlet.ServletContextHandler"
- if (ClassUtils.classIsLoadable(shaded)) {
- Class.forName(shaded)
- } else {
- Class.forName(vanilla)
- }
- }
-
def handleKillRequest(request: HttpServletRequest): Unit = {
if (killEnabled && engine.isDefined && engine.get.getServiceState != ServiceState.STOPPED) {
engine.get.stop()
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/EtcdShareLevelSparkEngineSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/EtcdShareLevelSparkEngineSuite.scala
index 46dc3b54c13..727b232e3f8 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/EtcdShareLevelSparkEngineSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/EtcdShareLevelSparkEngineSuite.scala
@@ -17,9 +17,7 @@
package org.apache.kyuubi.engine.spark
-import org.apache.kyuubi.config.KyuubiConf.ENGINE_CHECK_INTERVAL
-import org.apache.kyuubi.config.KyuubiConf.ENGINE_SHARE_LEVEL
-import org.apache.kyuubi.config.KyuubiConf.ENGINE_SPARK_MAX_LIFETIME
+import org.apache.kyuubi.config.KyuubiConf.{ENGINE_CHECK_INTERVAL, ENGINE_SHARE_LEVEL, ENGINE_SPARK_MAX_INITIAL_WAIT, ENGINE_SPARK_MAX_LIFETIME}
import org.apache.kyuubi.engine.ShareLevel
import org.apache.kyuubi.engine.ShareLevel.ShareLevel
@@ -30,6 +28,7 @@ trait EtcdShareLevelSparkEngineSuite
etcdConf ++ Map(
ENGINE_SHARE_LEVEL.key -> shareLevel.toString,
ENGINE_SPARK_MAX_LIFETIME.key -> "PT20s",
+ ENGINE_SPARK_MAX_INITIAL_WAIT.key -> "0",
ENGINE_CHECK_INTERVAL.key -> "PT5s")
}
}
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/SchedulerPoolSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/SchedulerPoolSuite.scala
index af8c90cf29e..a07f7d78382 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/SchedulerPoolSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/SchedulerPoolSuite.scala
@@ -19,6 +19,9 @@ package org.apache.kyuubi.engine.spark
import java.util.concurrent.Executors
+import scala.concurrent.duration.SECONDS
+
+import org.apache.spark.KyuubiSparkContextHelper
import org.apache.spark.scheduler.{SparkListener, SparkListenerJobEnd, SparkListenerJobStart}
import org.scalatest.concurrent.PatienceConfiguration.Timeout
import org.scalatest.time.SpanSugar.convertIntToGrainOfTime
@@ -76,33 +79,36 @@ class SchedulerPoolSuite extends WithSparkSQLEngine with HiveJDBCTestHelper {
eventually(Timeout(3.seconds)) {
assert(job0Started)
}
- Seq(1, 0).foreach { priority =>
- threads.execute(() => {
- priority match {
- case 0 =>
- withJdbcStatement() { statement =>
- statement.execute("SET kyuubi.operation.scheduler.pool=p0")
- statement.execute("SELECT java_method('java.lang.Thread', 'sleep', 1500l)" +
- "FROM range(1, 3, 1, 2)")
- }
-
- case 1 =>
- withJdbcStatement() { statement =>
- statement.execute("SET kyuubi.operation.scheduler.pool=p1")
- statement.execute("SELECT java_method('java.lang.Thread', 'sleep', 1500l)" +
- " FROM range(1, 3, 1, 2)")
- }
- }
- })
+ threads.execute(() => {
+ // job name job1
+ withJdbcStatement() { statement =>
+ statement.execute("SET kyuubi.operation.scheduler.pool=p1")
+ statement.execute("SELECT java_method('java.lang.Thread', 'sleep', 1500l)" +
+ " FROM range(1, 3, 1, 2)")
+ }
+ })
+ // make sure job1 started before job2
+ eventually(Timeout(2.seconds)) {
+ assert(job1StartTime > 0)
}
+
+ threads.execute(() => {
+ // job name job2
+ withJdbcStatement() { statement =>
+ statement.execute("SET kyuubi.operation.scheduler.pool=p0")
+ statement.execute("SELECT java_method('java.lang.Thread', 'sleep', 1500l)" +
+ "FROM range(1, 3, 1, 2)")
+ }
+ })
threads.shutdown()
- eventually(Timeout(20.seconds)) {
- // We can not ensure that job1 is started before job2 so here using abs.
- assert(Math.abs(job1StartTime - job2StartTime) < 1000)
- // Job1 minShare is 2(total resource) so that job2 should be allocated tasks after
- // job1 finished.
- assert(job2FinishTime - job1FinishTime >= 1000)
- }
+ threads.awaitTermination(20, SECONDS)
+ // make sure the SparkListener has received the finished events for job1 and job2.
+ KyuubiSparkContextHelper.waitListenerBus(spark)
+ // job1 should be started before job2
+ assert(job1StartTime < job2StartTime)
+ // job2 minShare is 2(total resource) so that job1 should be allocated tasks after
+ // job2 finished.
+ assert(job2FinishTime < job1FinishTime)
} finally {
spark.sparkContext.removeSparkListener(listener)
}
diff --git a/extensions/spark/kyuubi-spark-connector-kudu/src/test/scala/org/apache/kyuubi/spark/connector/kudu/KuduClientSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/SparkTBinaryFrontendServiceSuite.scala
similarity index 70%
rename from extensions/spark/kyuubi-spark-connector-kudu/src/test/scala/org/apache/kyuubi/spark/connector/kudu/KuduClientSuite.scala
rename to externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/SparkTBinaryFrontendServiceSuite.scala
index eebb4719cc2..5f81e51f825 100644
--- a/extensions/spark/kyuubi-spark-connector-kudu/src/test/scala/org/apache/kyuubi/spark/connector/kudu/KuduClientSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/SparkTBinaryFrontendServiceSuite.scala
@@ -15,18 +15,15 @@
* limitations under the License.
*/
-package org.apache.kyuubi.spark.connector.kudu
+package org.apache.kyuubi.engine.spark
-import org.apache.kudu.client.KuduClient
+import org.apache.hadoop.conf.Configuration
import org.apache.kyuubi.KyuubiFunSuite
-class KuduClientSuite extends KyuubiFunSuite with KuduMixin {
-
- test("kudu client") {
- val builder = new KuduClient.KuduClientBuilder(kuduMasterUrl)
- val kuduClient = builder.build()
-
- assert(kuduClient.findLeaderMasterServer().getPort === kuduMasterPort)
+class SparkTBinaryFrontendServiceSuite extends KyuubiFunSuite {
+ test("new hive conf") {
+ val hiveConf = SparkTBinaryFrontendService.hiveConf(new Configuration())
+ assert(hiveConf.getClass().getName == SparkTBinaryFrontendService.HIVE_CONF_CLASSNAME)
}
}
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/WithSparkSQLEngine.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/WithSparkSQLEngine.scala
index 629a8374b12..3b98c2efb16 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/WithSparkSQLEngine.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/WithSparkSQLEngine.scala
@@ -21,7 +21,7 @@ import org.apache.spark.sql.SparkSession
import org.apache.kyuubi.{KyuubiFunSuite, Utils}
import org.apache.kyuubi.config.KyuubiConf
-import org.apache.kyuubi.engine.spark.KyuubiSparkUtil.sparkMajorMinorVersion
+import org.apache.kyuubi.engine.spark.KyuubiSparkUtil.SPARK_ENGINE_RUNTIME_VERSION
trait WithSparkSQLEngine extends KyuubiFunSuite {
protected var spark: SparkSession = _
@@ -34,14 +34,8 @@ trait WithSparkSQLEngine extends KyuubiFunSuite {
// Affected by such configuration' default value
// engine.initialize.sql='SHOW DATABASES'
- protected var initJobId: Int = {
- sparkMajorMinorVersion match {
- case (3, minor) if minor >= 2 => 1 // SPARK-35378
- case (3, _) => 0
- case _ =>
- throw new IllegalArgumentException(s"Not Support spark version $sparkMajorMinorVersion")
- }
- }
+ // SPARK-35378
+ protected lazy val initJobId: Int = if (SPARK_ENGINE_RUNTIME_VERSION >= "3.2") 1 else 0
override def beforeAll(): Unit = {
startSparkEngine()
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/ZookeeperShareLevelSparkEngineSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/ZookeeperShareLevelSparkEngineSuite.scala
index 4ef96e61a58..f24abb36c0e 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/ZookeeperShareLevelSparkEngineSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/ZookeeperShareLevelSparkEngineSuite.scala
@@ -19,6 +19,7 @@ package org.apache.kyuubi.engine.spark
import org.apache.kyuubi.config.KyuubiConf.ENGINE_CHECK_INTERVAL
import org.apache.kyuubi.config.KyuubiConf.ENGINE_SHARE_LEVEL
+import org.apache.kyuubi.config.KyuubiConf.ENGINE_SPARK_MAX_INITIAL_WAIT
import org.apache.kyuubi.config.KyuubiConf.ENGINE_SPARK_MAX_LIFETIME
import org.apache.kyuubi.engine.ShareLevel
import org.apache.kyuubi.engine.ShareLevel.ShareLevel
@@ -30,6 +31,7 @@ trait ZookeeperShareLevelSparkEngineSuite
zookeeperConf ++ Map(
ENGINE_SHARE_LEVEL.key -> shareLevel.toString,
ENGINE_SPARK_MAX_LIFETIME.key -> "PT20s",
+ ENGINE_SPARK_MAX_INITIAL_WAIT.key -> "0",
ENGINE_CHECK_INTERVAL.key -> "PT5s")
}
}
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkArrowbasedOperationSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkArrowbasedOperationSuite.scala
index ae6237bb59c..d3d4a56d783 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkArrowbasedOperationSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkArrowbasedOperationSuite.scala
@@ -17,19 +17,36 @@
package org.apache.kyuubi.engine.spark.operation
+import java.lang.{Boolean => JBoolean}
import java.sql.Statement
+import java.util.{Locale, Set => JSet}
-import org.apache.spark.KyuubiSparkContextHelper
-import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project}
-import org.apache.spark.sql.execution.QueryExecution
+import org.apache.spark.{KyuubiSparkContextHelper, TaskContext}
+import org.apache.spark.scheduler.{SparkListener, SparkListenerJobStart}
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.execution.{CollectLimitExec, LocalTableScanExec, QueryExecution, SparkPlan}
+import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec
+import org.apache.spark.sql.execution.exchange.Exchange
+import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, SortMergeJoinExec}
+import org.apache.spark.sql.execution.metric.SparkMetricsTestUtils
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.kyuubi.SparkDatasetHelper
+import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.util.QueryExecutionListener
+import org.apache.kyuubi.KyuubiException
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.engine.spark.{SparkSQLEngine, WithSparkSQLEngine}
import org.apache.kyuubi.engine.spark.session.SparkSessionImpl
import org.apache.kyuubi.operation.SparkDataTypeTests
+import org.apache.kyuubi.util.reflect.{DynFields, DynMethods}
+import org.apache.kyuubi.util.reflect.ReflectUtils._
-class SparkArrowbasedOperationSuite extends WithSparkSQLEngine with SparkDataTypeTests {
+class SparkArrowbasedOperationSuite extends WithSparkSQLEngine with SparkDataTypeTests
+ with SparkMetricsTestUtils {
override protected def jdbcUrl: String = getJdbcUrl
@@ -46,6 +63,16 @@ class SparkArrowbasedOperationSuite extends WithSparkSQLEngine with SparkDataTyp
withJdbcStatement() { statement =>
checkResultSetFormat(statement, "arrow")
}
+ spark.catalog.listTables()
+ .collect()
+ .foreach { table =>
+ if (table.isTemporary) {
+ spark.catalog.dropTempView(table.name)
+ } else {
+ spark.sql(s"DROP TABLE IF EXISTS ${table.name}")
+ }
+ ()
+ }
}
test("detect resultSet format") {
@@ -92,52 +119,277 @@ class SparkArrowbasedOperationSuite extends WithSparkSQLEngine with SparkDataTyp
}
test("assign a new execution id for arrow-based result") {
- var plan: LogicalPlan = null
-
- val listener = new QueryExecutionListener {
- override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = {
- plan = qe.analyzed
+ val listener = new SQLMetricsListener
+ withJdbcStatement() { statement =>
+ withSparkListener(listener) {
+ val result = statement.executeQuery("select 1 as c1")
+ assert(result.next())
+ assert(result.getInt("c1") == 1)
}
- override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {}
}
+
+ assert(listener.queryExecution.analyzed.isInstanceOf[Project])
+ }
+
+ test("arrow-based query metrics") {
+ val listener = new SQLMetricsListener
withJdbcStatement() { statement =>
- // since all the new sessions have their owner listener bus, we should register the listener
- // in the current session.
- registerListener(listener)
+ withSparkListener(listener) {
+ val result = statement.executeQuery("select 1 as c1")
+ assert(result.next())
+ assert(result.getInt("c1") == 1)
+ }
+ }
+
+ val metrics = listener.queryExecution.executedPlan.collectLeaves().head.metrics
+ assert(metrics.contains("numOutputRows"))
+ assert(metrics("numOutputRows").value === 1)
+ }
+
+ test("SparkDatasetHelper.executeArrowBatchCollect should return expect row count") {
+ val returnSize = Seq(
+ 0, // spark optimizer guaranty the `limit != 0`, it's just for the sanity check
+ 7, // less than one partition
+ 10, // equal to one partition
+ 13, // between one and two partitions, run two jobs
+ 20, // equal to two partitions
+ 29, // between two and three partitions
+ 1000, // all partitions
+ 1001) // more than total row count
+
+ def runAndCheck(sparkPlan: SparkPlan, expectSize: Int): Unit = {
+ val arrowBinary = SparkDatasetHelper.executeArrowBatchCollect(sparkPlan)
+ val rows = fromBatchIterator(
+ arrowBinary.iterator,
+ sparkPlan.schema,
+ "",
+ true,
+ KyuubiSparkContextHelper.dummyTaskContext())
+ assert(rows.size == expectSize)
+ }
+
+ val excludedRules = Seq(
+ "org.apache.spark.sql.catalyst.optimizer.EliminateLimits",
+ "org.apache.spark.sql.catalyst.optimizer.OptimizeLimitZero",
+ "org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation").mkString(",")
+ withSQLConf(
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> excludedRules,
+ SQLConf.ADAPTIVE_OPTIMIZER_EXCLUDED_RULES.key -> excludedRules) {
+ // aqe
+ // outermost AdaptiveSparkPlanExec
+ spark.range(1000)
+ .repartitionByRange(100, col("id"))
+ .createOrReplaceTempView("t_1")
+ spark.sql("select * from t_1")
+ .foreachPartition { p: Iterator[Row] =>
+ assert(p.length == 10)
+ ()
+ }
+ returnSize.foreach { size =>
+ val df = spark.sql(s"select * from t_1 limit $size")
+ val headPlan = df.queryExecution.executedPlan.collectLeaves().head
+ if (SPARK_ENGINE_RUNTIME_VERSION >= "3.2") {
+ assert(headPlan.isInstanceOf[AdaptiveSparkPlanExec])
+ val finalPhysicalPlan =
+ SparkDatasetHelper.finalPhysicalPlan(headPlan.asInstanceOf[AdaptiveSparkPlanExec])
+ assert(finalPhysicalPlan.isInstanceOf[CollectLimitExec])
+ }
+ if (size > 1000) {
+ runAndCheck(df.queryExecution.executedPlan, 1000)
+ } else {
+ runAndCheck(df.queryExecution.executedPlan, size)
+ }
+ }
- val result = statement.executeQuery("select 1 as c1")
- assert(result.next())
- assert(result.getInt("c1") == 1)
+ // outermost CollectLimitExec
+ spark.range(0, 1000, 1, numPartitions = 100)
+ .createOrReplaceTempView("t_2")
+ spark.sql("select * from t_2")
+ .foreachPartition { p: Iterator[Row] =>
+ assert(p.length == 10)
+ ()
+ }
+ returnSize.foreach { size =>
+ val df = spark.sql(s"select * from t_2 limit $size")
+ val plan = df.queryExecution.executedPlan
+ assert(plan.isInstanceOf[CollectLimitExec])
+ if (size > 1000) {
+ runAndCheck(df.queryExecution.executedPlan, 1000)
+ } else {
+ runAndCheck(df.queryExecution.executedPlan, size)
+ }
+ }
}
- KyuubiSparkContextHelper.waitListenerBus(spark)
- unregisterListener(listener)
- assert(plan.isInstanceOf[Project])
}
- test("arrow-based query metrics") {
- var queryExecution: QueryExecution = null
+ test("aqe should work properly") {
+
+ val s = spark
+ import s.implicits._
+
+ spark.sparkContext.parallelize(
+ (1 to 100).map(i => TestData(i, i.toString))).toDF()
+ .createOrReplaceTempView("testData")
+ spark.sparkContext.parallelize(
+ TestData2(1, 1) ::
+ TestData2(1, 2) ::
+ TestData2(2, 1) ::
+ TestData2(2, 2) ::
+ TestData2(3, 1) ::
+ TestData2(3, 2) :: Nil,
+ 2).toDF()
+ .createOrReplaceTempView("testData2")
+
+ withSQLConf(
+ SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+ SQLConf.SHUFFLE_PARTITIONS.key -> "5",
+ SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "80") {
+ val (plan, adaptivePlan) = runAdaptiveAndVerifyResult(
+ """
+ |SELECT * FROM(
+ | SELECT * FROM testData join testData2 ON key = a where value = '1'
+ |) LIMIT 1
+ |""".stripMargin)
+ val smj = plan.collect { case smj: SortMergeJoinExec => smj }
+ val bhj = adaptivePlan.collect { case bhj: BroadcastHashJoinExec => bhj }
+ assert(smj.size == 1)
+ assert(bhj.size == 1)
+ }
+ }
+
+ test("result offset support") {
+ assume(SPARK_ENGINE_RUNTIME_VERSION >= "3.4")
+ var numStages = 0
+ val listener = new SparkListener {
+ override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
+ numStages = jobStart.stageInfos.length
+ }
+ }
+ withJdbcStatement() { statement =>
+ withSparkListener(listener) {
+ withPartitionedTable("t_3") {
+ statement.executeQuery("select * from t_3 limit 10 offset 10")
+ }
+ }
+ }
+ // the extra shuffle be introduced if the `offset` > 0
+ assert(numStages == 2)
+ }
- val listener = new QueryExecutionListener {
- override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = {
- queryExecution = qe
+ test("arrow serialization should not introduce extra shuffle for outermost limit") {
+ var numStages = 0
+ val listener = new SparkListener {
+ override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
+ numStages = jobStart.stageInfos.length
}
- override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {}
}
withJdbcStatement() { statement =>
- registerListener(listener)
- val result = statement.executeQuery("select 1 as c1")
- assert(result.next())
- assert(result.getInt("c1") == 1)
+ withSparkListener(listener) {
+ withPartitionedTable("t_3") {
+ statement.executeQuery("select * from t_3 limit 1000")
+ }
+ }
+ }
+ // Should be only one stage since there is no shuffle.
+ assert(numStages == 1)
+ }
+
+ test("CommandResultExec should not trigger job") {
+ val listener = new JobCountListener
+ val l2 = new SQLMetricsListener
+ val nodeName = spark.sql("SHOW TABLES").queryExecution.executedPlan.getClass.getName
+ if (SPARK_ENGINE_RUNTIME_VERSION < "3.2") {
+ assert(nodeName == "org.apache.spark.sql.execution.command.ExecutedCommandExec")
+ } else {
+ assert(nodeName == "org.apache.spark.sql.execution.CommandResultExec")
}
+ withJdbcStatement("table_1") { statement =>
+ statement.executeQuery("CREATE TABLE table_1 (id bigint) USING parquet")
+ withSparkListener(listener) {
+ withSparkListener(l2) {
+ val resultSet = statement.executeQuery("SHOW TABLES")
+ assert(resultSet.next())
+ assert(resultSet.getString("tableName") == "table_1")
+ }
+ }
+ }
+
+ if (SPARK_ENGINE_RUNTIME_VERSION < "3.2") {
+ // Note that before Spark 3.2, a LocalTableScan SparkPlan will be submitted, and the issue of
+ // preventing LocalTableScan from triggering a job submission was addressed in [KYUUBI #4710].
+ assert(l2.queryExecution.executedPlan.getClass.getName ==
+ "org.apache.spark.sql.execution.LocalTableScanExec")
+ } else {
+ assert(l2.queryExecution.executedPlan.getClass.getName ==
+ "org.apache.spark.sql.execution.CommandResultExec")
+ }
+ assert(listener.numJobs == 0)
+ }
+
+ test("LocalTableScanExec should not trigger job") {
+ val listener = new JobCountListener
+ withJdbcStatement("view_1") { statement =>
+ withSparkListener(listener) {
+ withAllSessions { s =>
+ import s.implicits._
+ Seq((1, "a")).toDF("c1", "c2").createOrReplaceTempView("view_1")
+ val plan = s.sql("select * from view_1").queryExecution.executedPlan
+ assert(plan.isInstanceOf[LocalTableScanExec])
+ }
+ val resultSet = statement.executeQuery("select * from view_1")
+ assert(resultSet.next())
+ assert(!resultSet.next())
+ }
+ }
+ assert(listener.numJobs == 0)
+ }
- KyuubiSparkContextHelper.waitListenerBus(spark)
- unregisterListener(listener)
+ test("LocalTableScanExec metrics") {
+ val listener = new SQLMetricsListener
+ withJdbcStatement("view_1") { statement =>
+ withSparkListener(listener) {
+ withAllSessions { s =>
+ import s.implicits._
+ Seq((1, "a")).toDF("c1", "c2").createOrReplaceTempView("view_1")
+ }
+ val result = statement.executeQuery("select * from view_1")
+ assert(result.next())
+ assert(!result.next())
+ }
+ }
- val metrics = queryExecution.executedPlan.collectLeaves().head.metrics
+ val metrics = listener.queryExecution.executedPlan.collectLeaves().head.metrics
assert(metrics.contains("numOutputRows"))
assert(metrics("numOutputRows").value === 1)
}
+ test("post LocalTableScanExec driver-side metrics") {
+ val expectedMetrics = Map(
+ 0L -> (("LocalTableScan", Map("number of output rows" -> "2"))))
+ withTables("view_1") {
+ val s = spark
+ import s.implicits._
+ Seq((1, "a"), (2, "b")).toDF("c1", "c2").createOrReplaceTempView("view_1")
+ val df = spark.sql("SELECT * FROM view_1")
+ val metrics = getSparkPlanMetrics(df)
+ assert(metrics == expectedMetrics)
+ }
+ }
+
+ test("post CommandResultExec driver-side metrics") {
+ spark.sql("show tables").show(truncate = false)
+ assume(SPARK_ENGINE_RUNTIME_VERSION >= "3.2")
+ val expectedMetrics = Map(
+ 0L -> (("CommandResult", Map("number of output rows" -> "2"))))
+ withTables("table_1", "table_2") {
+ spark.sql("CREATE TABLE table_1 (id bigint) USING parquet")
+ spark.sql("CREATE TABLE table_2 (id bigint) USING parquet")
+ val df = spark.sql("SHOW TABLES")
+ val metrics = getSparkPlanMetrics(df)
+ assert(metrics == expectedMetrics)
+ }
+ }
+
private def checkResultSetFormat(statement: Statement, expectFormat: String): Unit = {
val query =
s"""
@@ -160,21 +412,184 @@ class SparkArrowbasedOperationSuite extends WithSparkSQLEngine with SparkDataTyp
assert(resultSet.getString("col") === expect)
}
- private def registerListener(listener: QueryExecutionListener): Unit = {
- // since all the new sessions have their owner listener bus, we should register the listener
- // in the current session.
- SparkSQLEngine.currentEngine.get
- .backendService
- .sessionManager
- .allSessions()
- .foreach(_.asInstanceOf[SparkSessionImpl].spark.listenerManager.register(listener))
+ // since all the new sessions have their owner listener bus, we should register the listener
+ // in the current session.
+ private def withSparkListener[T](listener: QueryExecutionListener)(body: => T): T = {
+ withAllSessions(s => s.listenerManager.register(listener))
+ try {
+ val result = body
+ KyuubiSparkContextHelper.waitListenerBus(spark)
+ result
+ } finally {
+ withAllSessions(s => s.listenerManager.unregister(listener))
+ }
+ }
+
+ // since all the new sessions have their owner listener bus, we should register the listener
+ // in the current session.
+ private def withSparkListener[T](listener: SparkListener)(body: => T): T = {
+ withAllSessions(s => s.sparkContext.addSparkListener(listener))
+ try {
+ val result = body
+ KyuubiSparkContextHelper.waitListenerBus(spark)
+ result
+ } finally {
+ withAllSessions(s => s.sparkContext.removeSparkListener(listener))
+ }
+ }
+
+ private def withPartitionedTable[T](viewName: String)(body: => T): T = {
+ withAllSessions { spark =>
+ spark.range(0, 1000, 1, numPartitions = 100)
+ .createOrReplaceTempView(viewName)
+ }
+ try {
+ body
+ } finally {
+ withAllSessions { spark =>
+ spark.sql(s"DROP VIEW IF EXISTS $viewName")
+ }
+ }
}
- private def unregisterListener(listener: QueryExecutionListener): Unit = {
+ private def withAllSessions(op: SparkSession => Unit): Unit = {
SparkSQLEngine.currentEngine.get
.backendService
.sessionManager
.allSessions()
- .foreach(_.asInstanceOf[SparkSessionImpl].spark.listenerManager.unregister(listener))
+ .map(_.asInstanceOf[SparkSessionImpl].spark)
+ .foreach(op(_))
+ }
+
+ private def runAdaptiveAndVerifyResult(query: String): (SparkPlan, SparkPlan) = {
+ val dfAdaptive = spark.sql(query)
+ val planBefore = dfAdaptive.queryExecution.executedPlan
+ val result = dfAdaptive.collect()
+ withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false") {
+ val df = spark.sql(query)
+ QueryTest.checkAnswer(df, df.collect().toSeq)
+ }
+ val planAfter = dfAdaptive.queryExecution.executedPlan
+ val adaptivePlan = planAfter.asInstanceOf[AdaptiveSparkPlanExec].executedPlan
+ val exchanges = adaptivePlan.collect {
+ case e: Exchange => e
+ }
+ assert(exchanges.isEmpty, "The final plan should not contain any Exchange node.")
+ (dfAdaptive.queryExecution.sparkPlan, adaptivePlan)
+ }
+
+ /**
+ * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL
+ * configurations.
+ */
+ protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+ val conf = SQLConf.get
+ val (keys, values) = pairs.unzip
+ val currentValues = keys.map { key =>
+ if (conf.contains(key)) {
+ Some(conf.getConfString(key))
+ } else {
+ None
+ }
+ }
+ (keys, values).zipped.foreach { (k, v) =>
+ if (isStaticConfigKey(k)) {
+ throw new KyuubiException(s"Cannot modify the value of a static config: $k")
+ }
+ conf.setConfString(k, v)
+ }
+ try f
+ finally {
+ keys.zip(currentValues).foreach {
+ case (key, Some(value)) => conf.setConfString(key, value)
+ case (key, None) => conf.unsetConf(key)
+ }
+ }
+ }
+
+ private def withTables[T](tableNames: String*)(f: => T): T = {
+ try {
+ f
+ } finally {
+ tableNames.foreach { name =>
+ if (name.toUpperCase(Locale.ROOT).startsWith("VIEW")) {
+ spark.sql(s"DROP VIEW IF EXISTS $name")
+ } else {
+ spark.sql(s"DROP TABLE IF EXISTS $name")
+ }
+ }
+ }
+ }
+
+ /**
+ * This method provides a reflection-based implementation of [[SQLConf.isStaticConfigKey]] to
+ * adapt Spark-3.1.x
+ *
+ * TODO: Once we drop support for Spark 3.1.x, we can directly call
+ * [[SQLConf.isStaticConfigKey()]].
+ */
+ private def isStaticConfigKey(key: String): Boolean =
+ getField[JSet[String]]((SQLConf.getClass, SQLConf), "staticConfKeys").contains(key)
+
+ // the signature of function [[ArrowConverters.fromBatchIterator]] is changed in SPARK-43528
+ // (since Spark 3.5)
+ private lazy val fromBatchIteratorMethod = DynMethods.builder("fromBatchIterator")
+ .hiddenImpl( // for Spark 3.4 or previous
+ "org.apache.spark.sql.execution.arrow.ArrowConverters$",
+ classOf[Iterator[Array[Byte]]],
+ classOf[StructType],
+ classOf[String],
+ classOf[TaskContext])
+ .hiddenImpl( // for Spark 3.5 or later
+ "org.apache.spark.sql.execution.arrow.ArrowConverters$",
+ classOf[Iterator[Array[Byte]]],
+ classOf[StructType],
+ classOf[String],
+ classOf[Boolean],
+ classOf[TaskContext])
+ .build()
+
+ def fromBatchIterator(
+ arrowBatchIter: Iterator[Array[Byte]],
+ schema: StructType,
+ timeZoneId: String,
+ errorOnDuplicatedFieldNames: JBoolean,
+ context: TaskContext): Iterator[InternalRow] = {
+ val className = "org.apache.spark.sql.execution.arrow.ArrowConverters$"
+ val instance = DynFields.builder().impl(className, "MODULE$").build[Object]().get(null)
+ if (SPARK_ENGINE_RUNTIME_VERSION >= "3.5") {
+ fromBatchIteratorMethod.invoke[Iterator[InternalRow]](
+ instance,
+ arrowBatchIter,
+ schema,
+ timeZoneId,
+ errorOnDuplicatedFieldNames,
+ context)
+ } else {
+ fromBatchIteratorMethod.invoke[Iterator[InternalRow]](
+ instance,
+ arrowBatchIter,
+ schema,
+ timeZoneId,
+ context)
+ }
+ }
+
+ class JobCountListener extends SparkListener {
+ var numJobs = 0
+ override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
+ numJobs += 1
+ }
+ }
+
+ class SQLMetricsListener extends QueryExecutionListener {
+ var queryExecution: QueryExecution = _
+ override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = {
+ queryExecution = qe
+ }
+ override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {}
}
}
+
+case class TestData(key: Int, value: String)
+case class TestData2(a: Int, b: Int)
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkCatalogDatabaseOperationSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkCatalogDatabaseOperationSuite.scala
index 46208bff1e5..5ee01bda16e 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkCatalogDatabaseOperationSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkCatalogDatabaseOperationSuite.scala
@@ -22,7 +22,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap
import org.apache.kyuubi.config.KyuubiConf.ENGINE_OPERATION_CONVERT_CATALOG_DATABASE_ENABLED
import org.apache.kyuubi.engine.spark.WithSparkSQLEngine
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
import org.apache.kyuubi.operation.HiveJDBCTestHelper
class SparkCatalogDatabaseOperationSuite extends WithSparkSQLEngine with HiveJDBCTestHelper {
@@ -37,7 +37,7 @@ class SparkCatalogDatabaseOperationSuite extends WithSparkSQLEngine with HiveJDB
test("set/get current catalog") {
withJdbcStatement() { statement =>
val catalog = statement.getConnection.getCatalog
- assert(catalog == SparkCatalogShim.SESSION_CATALOG)
+ assert(catalog == SparkCatalogUtils.SESSION_CATALOG)
statement.getConnection.setCatalog("dummy")
val changedCatalog = statement.getConnection.getCatalog
assert(changedCatalog == "dummy")
@@ -61,7 +61,7 @@ class DummyCatalog extends CatalogPlugin {
_name = name
}
- private var _name: String = null
+ private var _name: String = _
override def name(): String = _name
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkOperationSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkOperationSuite.scala
index af514ceb3c0..adab0231d63 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkOperationSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/operation/SparkOperationSuite.scala
@@ -32,13 +32,14 @@ import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
import org.apache.spark.sql.types._
import org.apache.kyuubi.config.KyuubiConf
-import org.apache.kyuubi.engine.SemanticVersion
import org.apache.kyuubi.engine.spark.WithSparkSQLEngine
import org.apache.kyuubi.engine.spark.schema.SchemaHelper.TIMESTAMP_NTZ
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
+import org.apache.kyuubi.jdbc.hive.KyuubiStatement
import org.apache.kyuubi.operation.{HiveMetadataTests, SparkQueryTests}
import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant._
import org.apache.kyuubi.util.KyuubiHadoopUtils
+import org.apache.kyuubi.util.SemanticVersion
class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with SparkQueryTests {
@@ -49,7 +50,7 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
withJdbcStatement() { statement =>
val meta = statement.getConnection.getMetaData
val types = meta.getTableTypes
- val expected = SparkCatalogShim.sparkTableTypes.toIterator
+ val expected = SparkCatalogUtils.sparkTableTypes.toIterator
while (types.next()) {
assert(types.getString(TABLE_TYPE) === expected.next())
}
@@ -143,7 +144,7 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
var pos = 0
while (rowSet.next()) {
- assert(rowSet.getString(TABLE_CAT) === SparkCatalogShim.SESSION_CATALOG)
+ assert(rowSet.getString(TABLE_CAT) === SparkCatalogUtils.SESSION_CATALOG)
assert(rowSet.getString(TABLE_SCHEM) === defaultSchema)
assert(rowSet.getString(TABLE_NAME) === tableName)
assert(rowSet.getString(COLUMN_NAME) === schema(pos).name)
@@ -201,7 +202,7 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
val data = statement.getConnection.getMetaData
val rowSet = data.getColumns("", "global_temp", viewName, null)
while (rowSet.next()) {
- assert(rowSet.getString(TABLE_CAT) === SparkCatalogShim.SESSION_CATALOG)
+ assert(rowSet.getString(TABLE_CAT) === SparkCatalogUtils.SESSION_CATALOG)
assert(rowSet.getString(TABLE_SCHEM) === "global_temp")
assert(rowSet.getString(TABLE_NAME) === viewName)
assert(rowSet.getString(COLUMN_NAME) === "i")
@@ -228,7 +229,7 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
val data = statement.getConnection.getMetaData
val rowSet = data.getColumns("", "global_temp", viewName, "n")
while (rowSet.next()) {
- assert(rowSet.getString(TABLE_CAT) === SparkCatalogShim.SESSION_CATALOG)
+ assert(rowSet.getString(TABLE_CAT) === SparkCatalogUtils.SESSION_CATALOG)
assert(rowSet.getString(TABLE_SCHEM) === "global_temp")
assert(rowSet.getString(TABLE_NAME) === viewName)
assert(rowSet.getString(COLUMN_NAME) === "n")
@@ -306,28 +307,28 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
val tFetchResultsReq1 = new TFetchResultsReq(opHandle, TFetchOrientation.FETCH_NEXT, 1)
val tFetchResultsResp1 = client.FetchResults(tFetchResultsReq1)
assert(tFetchResultsResp1.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
- val idSeq1 = tFetchResultsResp1.getResults.getColumns.get(0).getI64Val.getValues.asScala.toSeq
+ val idSeq1 = tFetchResultsResp1.getResults.getColumns.get(0).getI64Val.getValues.asScala
assertResult(Seq(0L))(idSeq1)
// fetch next from first row
val tFetchResultsReq2 = new TFetchResultsReq(opHandle, TFetchOrientation.FETCH_NEXT, 1)
val tFetchResultsResp2 = client.FetchResults(tFetchResultsReq2)
assert(tFetchResultsResp2.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
- val idSeq2 = tFetchResultsResp2.getResults.getColumns.get(0).getI64Val.getValues.asScala.toSeq
+ val idSeq2 = tFetchResultsResp2.getResults.getColumns.get(0).getI64Val.getValues.asScala
assertResult(Seq(1L))(idSeq2)
// fetch prior from second row, expected got first row
val tFetchResultsReq3 = new TFetchResultsReq(opHandle, TFetchOrientation.FETCH_PRIOR, 1)
val tFetchResultsResp3 = client.FetchResults(tFetchResultsReq3)
assert(tFetchResultsResp3.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
- val idSeq3 = tFetchResultsResp3.getResults.getColumns.get(0).getI64Val.getValues.asScala.toSeq
+ val idSeq3 = tFetchResultsResp3.getResults.getColumns.get(0).getI64Val.getValues.asScala
assertResult(Seq(0L))(idSeq3)
// fetch first
val tFetchResultsReq4 = new TFetchResultsReq(opHandle, TFetchOrientation.FETCH_FIRST, 3)
val tFetchResultsResp4 = client.FetchResults(tFetchResultsReq4)
assert(tFetchResultsResp4.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
- val idSeq4 = tFetchResultsResp4.getResults.getColumns.get(0).getI64Val.getValues.asScala.toSeq
+ val idSeq4 = tFetchResultsResp4.getResults.getColumns.get(0).getI64Val.getValues.asScala
assertResult(Seq(0L, 1L))(idSeq4)
}
}
@@ -349,7 +350,7 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
val tFetchResultsResp1 = client.FetchResults(tFetchResultsReq1)
assert(tFetchResultsResp1.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
val idSeq1 = tFetchResultsResp1.getResults.getColumns.get(0)
- .getI64Val.getValues.asScala.toSeq
+ .getI64Val.getValues.asScala
assertResult(Seq(0L))(idSeq1)
// fetch next from first row
@@ -357,7 +358,7 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
val tFetchResultsResp2 = client.FetchResults(tFetchResultsReq2)
assert(tFetchResultsResp2.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
val idSeq2 = tFetchResultsResp2.getResults.getColumns.get(0)
- .getI64Val.getValues.asScala.toSeq
+ .getI64Val.getValues.asScala
assertResult(Seq(1L))(idSeq2)
// fetch prior from second row, expected got first row
@@ -365,7 +366,7 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
val tFetchResultsResp3 = client.FetchResults(tFetchResultsReq3)
assert(tFetchResultsResp3.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
val idSeq3 = tFetchResultsResp3.getResults.getColumns.get(0)
- .getI64Val.getValues.asScala.toSeq
+ .getI64Val.getValues.asScala
assertResult(Seq(0L))(idSeq3)
// fetch first
@@ -373,7 +374,7 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
val tFetchResultsResp4 = client.FetchResults(tFetchResultsReq4)
assert(tFetchResultsResp4.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
val idSeq4 = tFetchResultsResp4.getResults.getColumns.get(0)
- .getI64Val.getValues.asScala.toSeq
+ .getI64Val.getValues.asScala
assertResult(Seq(0L, 1L))(idSeq4)
}
}
@@ -728,6 +729,14 @@ class SparkOperationSuite extends WithSparkSQLEngine with HiveMetadataTests with
}
}
+ test("KYUUBI #5030: Support get query id in Spark engine") {
+ withJdbcStatement() { stmt =>
+ stmt.executeQuery("SELECT 1")
+ val queryId = stmt.asInstanceOf[KyuubiStatement].getQueryId
+ assert(queryId != null && queryId.nonEmpty)
+ }
+ }
+
private def whenMetaStoreURIsSetTo(uris: String)(func: String => Unit): Unit = {
val conf = spark.sparkContext.hadoopConfiguration
val origin = conf.get("hive.metastore.uris", "")
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/session/SessionSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/session/SessionSuite.scala
index 5e0b6c28e0f..b89c560b30c 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/session/SessionSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/session/SessionSuite.scala
@@ -27,7 +27,9 @@ import org.apache.kyuubi.service.ServiceState._
class SessionSuite extends WithSparkSQLEngine with HiveJDBCTestHelper {
override def withKyuubiConf: Map[String, String] = {
- Map(ENGINE_SHARE_LEVEL.key -> "CONNECTION")
+ Map(
+ ENGINE_SHARE_LEVEL.key -> "CONNECTION",
+ ENGINE_SPARK_MAX_INITIAL_WAIT.key -> "0")
}
override protected def beforeEach(): Unit = {
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/udf/KyuubiDefinedFunctionSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/udf/KyuubiDefinedFunctionSuite.scala
index f355e1e6b51..7a3f8c94071 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/udf/KyuubiDefinedFunctionSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/udf/KyuubiDefinedFunctionSuite.scala
@@ -19,24 +19,23 @@ package org.apache.kyuubi.engine.spark.udf
import java.nio.file.Paths
-import org.apache.kyuubi.{KyuubiFunSuite, MarkdownBuilder, MarkdownUtils, Utils}
+import org.apache.kyuubi.{KyuubiFunSuite, MarkdownBuilder, Utils}
+import org.apache.kyuubi.util.GoldenFileUtils._
-// scalastyle:off line.size.limit
/**
* End-to-end test cases for configuration doc file
- * The golden result file is "docs/sql/functions.md".
+ * The golden result file is "docs/extensions/engines/spark/functions.md".
*
* To run the entire test suite:
* {{{
- * build/mvn clean test -pl externals/kyuubi-spark-sql-engine -am -Pflink-provided,spark-provided,hive-provided -DwildcardSuites=org.apache.kyuubi.engine.spark.udf.KyuubiDefinedFunctionSuite
+ * KYUUBI_UPDATE=0 dev/gen/gen_spark_kdf_docs.sh
* }}}
*
* To re-generate golden files for entire suite, run:
* {{{
- * KYUUBI_UPDATE=1 build/mvn clean test -pl externals/kyuubi-spark-sql-engine -am -Pflink-provided,spark-provided,hive-provided -DwildcardSuites=org.apache.kyuubi.engine.spark.udf.KyuubiDefinedFunctionSuite
+ * dev/gen/gen_spark_kdf_docs.sh
* }}}
*/
-// scalastyle:on line.size.limit
class KyuubiDefinedFunctionSuite extends KyuubiFunSuite {
private val kyuubiHome: String = Utils.getCodeSourceLocation(getClass)
@@ -48,24 +47,18 @@ class KyuubiDefinedFunctionSuite extends KyuubiFunSuite {
test("verify or update kyuubi spark sql functions") {
val builder = MarkdownBuilder(licenced = true, getClass.getName)
- builder
- .line("# Auxiliary SQL Functions")
- .line("""Kyuubi provides several auxiliary SQL functions as supplement to Spark's
+ builder += "# Auxiliary SQL Functions" +=
+ """Kyuubi provides several auxiliary SQL functions as supplement to Spark's
| [Built-in Functions](https://spark.apache.org/docs/latest/api/sql/index.html#
- |built-in-functions)""")
- .lines("""
+ |built-in-functions)""" ++=
+ """
| Name | Description | Return Type | Since
| --- | --- | --- | ---
- |
- |""")
+ |"""
KDFRegistry.registeredFunctions.foreach { func =>
- builder.line(s"${func.name} | ${func.description} | ${func.returnType} | ${func.since}")
+ builder += s"${func.name} | ${func.description} | ${func.returnType} | ${func.since}"
}
- MarkdownUtils.verifyOutput(
- markdown,
- builder,
- getClass.getCanonicalName,
- "externals/kyuubi-spark-sql-engine")
+ verifyOrRegenerateGoldenFile(markdown, builder.toMarkdown, "dev/gen/gen_spark_kdf_docs.sh")
}
}
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/jdbc/KyuubiHiveDriverSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/jdbc/KyuubiHiveDriverSuite.scala
index 4d3c754980d..ae68440df3e 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/jdbc/KyuubiHiveDriverSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/jdbc/KyuubiHiveDriverSuite.scala
@@ -22,7 +22,7 @@ import java.util.Properties
import org.apache.kyuubi.IcebergSuiteMixin
import org.apache.kyuubi.engine.spark.WithSparkSQLEngine
-import org.apache.kyuubi.engine.spark.shim.SparkCatalogShim
+import org.apache.kyuubi.engine.spark.util.SparkCatalogUtils
import org.apache.kyuubi.jdbc.hive.{KyuubiConnection, KyuubiStatement}
import org.apache.kyuubi.tags.IcebergTest
@@ -47,15 +47,15 @@ class KyuubiHiveDriverSuite extends WithSparkSQLEngine with IcebergSuiteMixin {
val metaData = connection.getMetaData
assert(metaData.getClass.getName === "org.apache.kyuubi.jdbc.hive.KyuubiDatabaseMetaData")
val statement = connection.createStatement()
- val table1 = s"${SparkCatalogShim.SESSION_CATALOG}.default.kyuubi_hive_jdbc"
+ val table1 = s"${SparkCatalogUtils.SESSION_CATALOG}.default.kyuubi_hive_jdbc"
val table2 = s"$catalog.default.hdp_cat_tbl"
try {
statement.execute(s"CREATE TABLE $table1(key int) USING parquet")
statement.execute(s"CREATE TABLE $table2(key int) USING $format")
- val resultSet1 = metaData.getTables(SparkCatalogShim.SESSION_CATALOG, "default", "%", null)
+ val resultSet1 = metaData.getTables(SparkCatalogUtils.SESSION_CATALOG, "default", "%", null)
assert(resultSet1.next())
- assert(resultSet1.getString(1) === SparkCatalogShim.SESSION_CATALOG)
+ assert(resultSet1.getString(1) === SparkCatalogUtils.SESSION_CATALOG)
assert(resultSet1.getString(2) === "default")
assert(resultSet1.getString(3) === "kyuubi_hive_jdbc")
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/KyuubiSparkContextHelper.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/KyuubiSparkContextHelper.scala
index 8293123ead7..1b662eadf96 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/KyuubiSparkContextHelper.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/KyuubiSparkContextHelper.scala
@@ -27,4 +27,6 @@ object KyuubiSparkContextHelper {
def waitListenerBus(spark: SparkSession): Unit = {
spark.sparkContext.listenerBus.waitUntilEmpty()
}
+
+ def dummyTaskContext(): TaskContextImpl = TaskContext.empty()
}
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/kyuubi/SparkSQLEngineDeregisterSuite.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/kyuubi/SparkSQLEngineDeregisterSuite.scala
index 8dc93759b93..4dddcd4eef3 100644
--- a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/kyuubi/SparkSQLEngineDeregisterSuite.scala
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/kyuubi/SparkSQLEngineDeregisterSuite.scala
@@ -24,9 +24,8 @@ import org.apache.spark.sql.internal.SQLConf.ANSI_ENABLED
import org.scalatest.time.SpanSugar.convertIntToGrainOfTime
import org.apache.kyuubi.config.KyuubiConf._
-import org.apache.kyuubi.engine.spark.KyuubiSparkUtil.sparkMajorMinorVersion
-import org.apache.kyuubi.engine.spark.WithDiscoverySparkSQLEngine
-import org.apache.kyuubi.engine.spark.WithEmbeddedZookeeper
+import org.apache.kyuubi.engine.spark.{WithDiscoverySparkSQLEngine, WithEmbeddedZookeeper}
+import org.apache.kyuubi.engine.spark.KyuubiSparkUtil.SPARK_ENGINE_RUNTIME_VERSION
import org.apache.kyuubi.service.ServiceState
abstract class SparkSQLEngineDeregisterSuite
@@ -61,10 +60,11 @@ abstract class SparkSQLEngineDeregisterSuite
class SparkSQLEngineDeregisterExceptionSuite extends SparkSQLEngineDeregisterSuite {
override def withKyuubiConf: Map[String, String] = {
super.withKyuubiConf ++ Map(ENGINE_DEREGISTER_EXCEPTION_CLASSES.key -> {
- sparkMajorMinorVersion match {
+ if (SPARK_ENGINE_RUNTIME_VERSION >= "3.3") {
// see https://issues.apache.org/jira/browse/SPARK-35958
- case (3, minor) if minor > 2 => "org.apache.spark.SparkArithmeticException"
- case _ => classOf[ArithmeticException].getCanonicalName
+ "org.apache.spark.SparkArithmeticException"
+ } else {
+ classOf[ArithmeticException].getCanonicalName
}
})
@@ -94,10 +94,11 @@ class SparkSQLEngineDeregisterExceptionTTLSuite
zookeeperConf ++ Map(
ANSI_ENABLED.key -> "true",
ENGINE_DEREGISTER_EXCEPTION_CLASSES.key -> {
- sparkMajorMinorVersion match {
+ if (SPARK_ENGINE_RUNTIME_VERSION >= "3.3") {
// see https://issues.apache.org/jira/browse/SPARK-35958
- case (3, minor) if minor > 2 => "org.apache.spark.SparkArithmeticException"
- case _ => classOf[ArithmeticException].getCanonicalName
+ "org.apache.spark.SparkArithmeticException"
+ } else {
+ classOf[ArithmeticException].getCanonicalName
}
},
ENGINE_DEREGISTER_JOB_MAX_FAILURES.key -> maxJobFailures.toString,
diff --git a/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/sql/execution/metric/SparkMetricsTestUtils.scala b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/sql/execution/metric/SparkMetricsTestUtils.scala
new file mode 100644
index 00000000000..7ab06f0ef18
--- /dev/null
+++ b/externals/kyuubi-spark-sql-engine/src/test/scala/org/apache/spark/sql/execution/metric/SparkMetricsTestUtils.scala
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.metric
+
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.execution.SparkPlanInfo
+import org.apache.spark.sql.execution.ui.SparkPlanGraph
+import org.apache.spark.sql.kyuubi.SparkDatasetHelper
+
+import org.apache.kyuubi.engine.spark.WithSparkSQLEngine
+
+trait SparkMetricsTestUtils {
+ this: WithSparkSQLEngine =>
+
+ private lazy val statusStore = spark.sharedState.statusStore
+ private def currentExecutionIds(): Set[Long] = {
+ spark.sparkContext.listenerBus.waitUntilEmpty(10000)
+ statusStore.executionsList.map(_.executionId).toSet
+ }
+
+ protected def getSparkPlanMetrics(df: DataFrame): Map[Long, (String, Map[String, Any])] = {
+ val previousExecutionIds = currentExecutionIds()
+ SparkDatasetHelper.executeCollect(df)
+ spark.sparkContext.listenerBus.waitUntilEmpty(10000)
+ val executionIds = currentExecutionIds().diff(previousExecutionIds)
+ assert(executionIds.size === 1)
+ val executionId = executionIds.head
+ val metricValues = statusStore.executionMetrics(executionId)
+ SparkPlanGraph(SparkPlanInfo.fromSparkPlan(df.queryExecution.executedPlan)).allNodes
+ .map { node =>
+ val nodeMetrics = node.metrics.map { metric =>
+ val metricValue = metricValues(metric.accumulatorId)
+ (metric.name, metricValue)
+ }.toMap
+ (node.id, node.name -> nodeMetrics)
+ }.toMap
+ }
+}
diff --git a/externals/kyuubi-trino-engine/pom.xml b/externals/kyuubi-trino-engine/pom.xml
index 7aea8f33a6f..7d91e4a864f 100644
--- a/externals/kyuubi-trino-engine/pom.xml
+++ b/externals/kyuubi-trino-engine/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../../pom.xml
- kyuubi-trino-engine_2.12
+ kyuubi-trino-engine_${scala.binary.version}jarKyuubi Project Engine Trinohttps://kyuubi.apache.org/
diff --git a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/ExecuteStatement.scala b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/ExecuteStatement.scala
index eb1b273007d..3e7cce80cdf 100644
--- a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/ExecuteStatement.scala
+++ b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/ExecuteStatement.scala
@@ -19,7 +19,7 @@ package org.apache.kyuubi.engine.trino.operation
import java.util.concurrent.RejectedExecutionException
-import org.apache.hive.service.rpc.thrift.TRowSet
+import org.apache.hive.service.rpc.thrift.TFetchResultsResp
import org.apache.kyuubi.{KyuubiSQLException, Logging}
import org.apache.kyuubi.engine.trino.TrinoStatement
@@ -82,7 +82,9 @@ class ExecuteStatement(
}
}
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
setHasResultSet(true)
@@ -97,7 +99,10 @@ class ExecuteStatement(
val taken = iter.take(rowSetSize)
val resultRowSet = RowSet.toTRowSet(taken.toList, schema, getProtocolVersion)
resultRowSet.setStartRowOffset(iter.getPosition)
- resultRowSet
+ val fetchResultsResp = new TFetchResultsResp(OK_STATUS)
+ fetchResultsResp.setResults(resultRowSet)
+ fetchResultsResp.setHasMoreRows(false)
+ fetchResultsResp
}
private def executeStatement(trinoStatement: TrinoStatement): Unit = {
diff --git a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/GetCurrentCatalog.scala b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/GetCurrentCatalog.scala
index 3d8c7fd6c5b..504a53a4149 100644
--- a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/GetCurrentCatalog.scala
+++ b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/GetCurrentCatalog.scala
@@ -23,11 +23,16 @@ import io.trino.client.ClientStandardTypes.VARCHAR
import io.trino.client.ClientTypeSignature.VARCHAR_UNBOUNDED_LENGTH
import org.apache.kyuubi.operation.IterableFetchIterator
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
class GetCurrentCatalog(session: Session)
extends TrinoOperation(session) {
+ private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def runInternal(): Unit = {
try {
val session = trinoContext.clientSession.get
diff --git a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/GetCurrentDatabase.scala b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/GetCurrentDatabase.scala
index 3bf2987b46a..3ab598ef09e 100644
--- a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/GetCurrentDatabase.scala
+++ b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/GetCurrentDatabase.scala
@@ -23,11 +23,16 @@ import io.trino.client.ClientStandardTypes.VARCHAR
import io.trino.client.ClientTypeSignature.VARCHAR_UNBOUNDED_LENGTH
import org.apache.kyuubi.operation.IterableFetchIterator
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
class GetCurrentDatabase(session: Session)
extends TrinoOperation(session) {
+ private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def runInternal(): Unit = {
try {
val session = trinoContext.clientSession.get
diff --git a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/SetCurrentCatalog.scala b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/SetCurrentCatalog.scala
index 09ba4262f70..16836b0a97d 100644
--- a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/SetCurrentCatalog.scala
+++ b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/SetCurrentCatalog.scala
@@ -19,11 +19,16 @@ package org.apache.kyuubi.engine.trino.operation
import io.trino.client.ClientSession
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
class SetCurrentCatalog(session: Session, catalog: String)
extends TrinoOperation(session) {
+ private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def runInternal(): Unit = {
try {
val session = trinoContext.clientSession.get
diff --git a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/SetCurrentDatabase.scala b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/SetCurrentDatabase.scala
index f25cc9e0c6d..aa4697f5f0e 100644
--- a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/SetCurrentDatabase.scala
+++ b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/SetCurrentDatabase.scala
@@ -19,11 +19,16 @@ package org.apache.kyuubi.engine.trino.operation
import io.trino.client.ClientSession
+import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.session.Session
class SetCurrentDatabase(session: Session, database: String)
extends TrinoOperation(session) {
+ private val operationLog: OperationLog = OperationLog.createOperationLog(session, getHandle)
+
+ override def getOperationLog: Option[OperationLog] = Option(operationLog)
+
override protected def runInternal(): Unit = {
try {
val session = trinoContext.clientSession.get
diff --git a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/TrinoOperation.scala b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/TrinoOperation.scala
index 6e40f65f290..11eaa1bc1d7 100644
--- a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/TrinoOperation.scala
+++ b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/operation/TrinoOperation.scala
@@ -21,7 +21,7 @@ import java.io.IOException
import io.trino.client.Column
import io.trino.client.StatementClient
-import org.apache.hive.service.rpc.thrift.{TGetResultSetMetadataResp, TRowSet}
+import org.apache.hive.service.rpc.thrift.{TFetchResultsResp, TGetResultSetMetadataResp}
import org.apache.kyuubi.KyuubiSQLException
import org.apache.kyuubi.Utils
@@ -54,7 +54,9 @@ abstract class TrinoOperation(session: Session) extends AbstractOperation(sessio
resp
}
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
setHasResultSet(true)
@@ -66,7 +68,10 @@ abstract class TrinoOperation(session: Session) extends AbstractOperation(sessio
val taken = iter.take(rowSetSize)
val resultRowSet = RowSet.toTRowSet(taken.toList, schema, getProtocolVersion)
resultRowSet.setStartRowOffset(iter.getPosition)
- resultRowSet
+ val resp = new TFetchResultsResp(OK_STATUS)
+ resp.setResults(resultRowSet)
+ resp.setHasMoreRows(false)
+ resp
}
override protected def beforeRun(): Unit = {
@@ -75,7 +80,7 @@ abstract class TrinoOperation(session: Session) extends AbstractOperation(sessio
}
override protected def afterRun(): Unit = {
- state.synchronized {
+ withLockRequired {
if (!isTerminalState(state)) {
setState(OperationState.FINISHED)
}
@@ -108,7 +113,7 @@ abstract class TrinoOperation(session: Session) extends AbstractOperation(sessio
// could be thrown.
case e: Throwable =>
if (cancel && trino.isRunning) trino.cancelLeafStage()
- state.synchronized {
+ withLockRequired {
val errMsg = Utils.stringifyException(e)
if (state == OperationState.TIMEOUT) {
val ke = KyuubiSQLException(s"Timeout operating $opType: $errMsg")
diff --git a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/session/TrinoSessionImpl.scala b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/session/TrinoSessionImpl.scala
index 81f973b1b5e..362ee3ed06a 100644
--- a/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/session/TrinoSessionImpl.scala
+++ b/externals/kyuubi-trino-engine/src/main/scala/org/apache/kyuubi/engine/trino/session/TrinoSessionImpl.scala
@@ -24,6 +24,7 @@ import java.util.concurrent.TimeUnit
import io.airlift.units.Duration
import io.trino.client.ClientSession
+import io.trino.client.OkHttpUtil
import okhttp3.OkHttpClient
import org.apache.hive.service.rpc.thrift.{TGetInfoType, TGetInfoValue, TProtocolVersion}
@@ -35,7 +36,7 @@ import org.apache.kyuubi.engine.trino.{TrinoConf, TrinoContext, TrinoStatement}
import org.apache.kyuubi.engine.trino.event.TrinoSessionEvent
import org.apache.kyuubi.events.EventBus
import org.apache.kyuubi.operation.{Operation, OperationHandle}
-import org.apache.kyuubi.session.{AbstractSession, SessionHandle, SessionManager}
+import org.apache.kyuubi.session.{AbstractSession, SessionHandle, SessionManager, USE_CATALOG, USE_DATABASE}
class TrinoSessionImpl(
protocol: TProtocolVersion,
@@ -46,50 +47,51 @@ class TrinoSessionImpl(
sessionManager: SessionManager)
extends AbstractSession(protocol, user, password, ipAddress, conf, sessionManager) {
+ val sessionConf: KyuubiConf = sessionManager.getConf
+
override val handle: SessionHandle =
conf.get(KYUUBI_SESSION_HANDLE_KEY).map(SessionHandle.fromUUID).getOrElse(SessionHandle())
+ private val username: String = sessionConf
+ .getOption(KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY).getOrElse(currentUser)
+
var trinoContext: TrinoContext = _
private var clientSession: ClientSession = _
- private var catalogName: String = null
- private var databaseName: String = null
-
+ private var catalogName: String = _
+ private var databaseName: String = _
private val sessionEvent = TrinoSessionEvent(this)
override def open(): Unit = {
- normalizedConf.foreach {
- case ("use:catalog", catalog) => catalogName = catalog
- case ("use:database", database) => databaseName = database
- case _ => // do nothing
+
+ val (useCatalogAndDatabaseConf, _) = normalizedConf.partition { case (k, _) =>
+ Array(USE_CATALOG, USE_DATABASE).contains(k)
}
- val httpClient = new OkHttpClient.Builder().build()
+ useCatalogAndDatabaseConf.foreach {
+ case (USE_CATALOG, catalog) => catalogName = catalog
+ case (USE_DATABASE, database) => databaseName = database
+ }
+ if (catalogName == null) {
+ catalogName = sessionConf.get(KyuubiConf.ENGINE_TRINO_CONNECTION_CATALOG)
+ .getOrElse(throw KyuubiSQLException("Trino default catalog can not be null!"))
+ }
clientSession = createClientSession()
- trinoContext = TrinoContext(httpClient, clientSession)
+ trinoContext = TrinoContext(createHttpClient(), clientSession)
super.open()
EventBus.post(sessionEvent)
}
private def createClientSession(): ClientSession = {
- val sessionConf = sessionManager.getConf
val connectionUrl = sessionConf.get(KyuubiConf.ENGINE_TRINO_CONNECTION_URL).getOrElse(
throw KyuubiSQLException("Trino server url can not be null!"))
- if (catalogName == null) {
- catalogName = sessionConf.get(
- KyuubiConf.ENGINE_TRINO_CONNECTION_CATALOG).getOrElse(
- throw KyuubiSQLException("Trino default catalog can not be null!"))
- }
-
- val user = sessionConf
- .getOption(KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY).getOrElse(currentUser)
val clientRequestTimeout = sessionConf.get(TrinoConf.CLIENT_REQUEST_TIMEOUT)
new ClientSession(
URI.create(connectionUrl),
- user,
+ username,
Optional.empty(),
"kyuubi",
Optional.empty(),
@@ -110,6 +112,37 @@ class TrinoSessionImpl(
true)
}
+ private def createHttpClient(): OkHttpClient = {
+ val keystorePath = sessionConf.get(KyuubiConf.ENGINE_TRINO_CONNECTION_KEYSTORE_PATH)
+ val keystorePassword = sessionConf.get(KyuubiConf.ENGINE_TRINO_CONNECTION_KEYSTORE_PASSWORD)
+ val keystoreType = sessionConf.get(KyuubiConf.ENGINE_TRINO_CONNECTION_KEYSTORE_TYPE)
+ val truststorePath = sessionConf.get(KyuubiConf.ENGINE_TRINO_CONNECTION_TRUSTSTORE_PATH)
+ val truststorePassword = sessionConf.get(KyuubiConf.ENGINE_TRINO_CONNECTION_TRUSTSTORE_PASSWORD)
+ val truststoreType = sessionConf.get(KyuubiConf.ENGINE_TRINO_CONNECTION_TRUSTSTORE_TYPE)
+
+ val serverScheme = clientSession.getServer.getScheme
+
+ val builder = new OkHttpClient.Builder()
+
+ OkHttpUtil.setupSsl(
+ builder,
+ Optional.ofNullable(keystorePath.orNull),
+ Optional.ofNullable(keystorePassword.orNull),
+ Optional.ofNullable(keystoreType.orNull),
+ Optional.ofNullable(truststorePath.orNull),
+ Optional.ofNullable(truststorePassword.orNull),
+ Optional.ofNullable(truststoreType.orNull))
+
+ sessionConf.get(KyuubiConf.ENGINE_TRINO_CONNECTION_PASSWORD).foreach { password =>
+ require(
+ serverScheme.equalsIgnoreCase("https"),
+ "Trino engine using username/password requires HTTPS to be enabled")
+ builder.addInterceptor(OkHttpUtil.basicAuth(username, password))
+ }
+
+ builder.build()
+ }
+
override protected def runOperation(operation: Operation): OperationHandle = {
sessionEvent.totalOperations += 1
super.runOperation(operation)
diff --git a/externals/kyuubi-trino-engine/src/test/scala/org/apache/kyuubi/engine/trino/TrinoStatementSuite.scala b/externals/kyuubi-trino-engine/src/test/scala/org/apache/kyuubi/engine/trino/TrinoStatementSuite.scala
index fc9f1af5f79..dec753ad4f6 100644
--- a/externals/kyuubi-trino-engine/src/test/scala/org/apache/kyuubi/engine/trino/TrinoStatementSuite.scala
+++ b/externals/kyuubi-trino-engine/src/test/scala/org/apache/kyuubi/engine/trino/TrinoStatementSuite.scala
@@ -30,15 +30,15 @@ class TrinoStatementSuite extends WithTrinoContainerServer {
assert(schema.size === 1)
assert(schema(0).getName === "_col0")
- assert(resultSet.toIterator.hasNext)
- assert(resultSet.toIterator.next() === List(1))
+ assert(resultSet.hasNext)
+ assert(resultSet.next() === List(1))
val trinoStatement2 = TrinoStatement(trinoContext, kyuubiConf, "show schemas")
val schema2 = trinoStatement2.getColumns
val resultSet2 = trinoStatement2.execute()
assert(schema2.size === 1)
- assert(resultSet2.toIterator.hasNext)
+ assert(resultSet2.hasNext)
}
}
diff --git a/externals/kyuubi-trino-engine/src/test/scala/org/apache/kyuubi/engine/trino/operation/TrinoOperationSuite.scala b/externals/kyuubi-trino-engine/src/test/scala/org/apache/kyuubi/engine/trino/operation/TrinoOperationSuite.scala
index a6f125af52c..90939a3e4e0 100644
--- a/externals/kyuubi-trino-engine/src/test/scala/org/apache/kyuubi/engine/trino/operation/TrinoOperationSuite.scala
+++ b/externals/kyuubi-trino-engine/src/test/scala/org/apache/kyuubi/engine/trino/operation/TrinoOperationSuite.scala
@@ -590,14 +590,14 @@ class TrinoOperationSuite extends WithTrinoEngine with TrinoQueryTests {
val tFetchResultsReq1 = new TFetchResultsReq(opHandle, TFetchOrientation.FETCH_NEXT, 1)
val tFetchResultsResp1 = client.FetchResults(tFetchResultsReq1)
assert(tFetchResultsResp1.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
- val idSeq1 = tFetchResultsResp1.getResults.getColumns.get(0).getI32Val.getValues.asScala.toSeq
+ val idSeq1 = tFetchResultsResp1.getResults.getColumns.get(0).getI32Val.getValues.asScala
assertResult(Seq(0L))(idSeq1)
// fetch next from first row
val tFetchResultsReq2 = new TFetchResultsReq(opHandle, TFetchOrientation.FETCH_NEXT, 1)
val tFetchResultsResp2 = client.FetchResults(tFetchResultsReq2)
assert(tFetchResultsResp2.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
- val idSeq2 = tFetchResultsResp2.getResults.getColumns.get(0).getI32Val.getValues.asScala.toSeq
+ val idSeq2 = tFetchResultsResp2.getResults.getColumns.get(0).getI32Val.getValues.asScala
assertResult(Seq(1L))(idSeq2)
val tFetchResultsReq3 = new TFetchResultsReq(opHandle, TFetchOrientation.FETCH_PRIOR, 1)
@@ -607,7 +607,7 @@ class TrinoOperationSuite extends WithTrinoEngine with TrinoQueryTests {
} else {
assert(tFetchResultsResp3.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
val idSeq3 =
- tFetchResultsResp3.getResults.getColumns.get(0).getI32Val.getValues.asScala.toSeq
+ tFetchResultsResp3.getResults.getColumns.get(0).getI32Val.getValues.asScala
assertResult(Seq(0L))(idSeq3)
}
@@ -618,7 +618,7 @@ class TrinoOperationSuite extends WithTrinoEngine with TrinoQueryTests {
} else {
assert(tFetchResultsResp4.getStatus.getStatusCode === TStatusCode.SUCCESS_STATUS)
val idSeq4 =
- tFetchResultsResp4.getResults.getColumns.get(0).getI32Val.getValues.asScala.toSeq
+ tFetchResultsResp4.getResults.getColumns.get(0).getI32Val.getValues.asScala
assertResult(Seq(0L, 1L))(idSeq4)
}
}
@@ -771,8 +771,8 @@ class TrinoOperationSuite extends WithTrinoEngine with TrinoQueryTests {
assert(schema.size === 1)
assert(schema(0).getName === "_col0")
- assert(resultSet.toIterator.hasNext)
- version = resultSet.toIterator.next().head.toString
+ assert(resultSet.hasNext)
+ version = resultSet.next().head.toString
}
version
}
diff --git a/integration-tests/kyuubi-flink-it/pom.xml b/integration-tests/kyuubi-flink-it/pom.xml
index c6a55c62cb6..3c0e3f31a7c 100644
--- a/integration-tests/kyuubi-flink-it/pom.xml
+++ b/integration-tests/kyuubi-flink-it/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubiintegration-tests
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../pom.xml
- kyuubi-flink-it_2.12
+ kyuubi-flink-it_${scala.binary.version}Kyuubi Test Flink SQL IThttps://kyuubi.apache.org/
@@ -79,6 +79,37 @@
test
+
+
+ org.apache.hadoop
+ hadoop-client-minicluster
+ test
+
+
+
+ org.bouncycastle
+ bcprov-jdk15on
+ test
+
+
+
+ org.bouncycastle
+ bcpkix-jdk15on
+ test
+
+
+
+ jakarta.activation
+ jakarta.activation-api
+ test
+
+
+
+ jakarta.xml.bind
+ jakarta.xml.bind-api
+ test
+
+
diff --git a/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/WithKyuubiServerAndYarnMiniCluster.scala b/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/WithKyuubiServerAndYarnMiniCluster.scala
new file mode 100644
index 00000000000..de9a8ae2d28
--- /dev/null
+++ b/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/WithKyuubiServerAndYarnMiniCluster.scala
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.it.flink
+
+import java.io.{File, FileWriter}
+import java.nio.file.Paths
+
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+
+import org.apache.kyuubi.{KyuubiFunSuite, Utils, WithKyuubiServer}
+import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.KyuubiConf.KYUUBI_ENGINE_ENV_PREFIX
+import org.apache.kyuubi.server.{MiniDFSService, MiniYarnService}
+
+trait WithKyuubiServerAndYarnMiniCluster extends KyuubiFunSuite with WithKyuubiServer {
+
+ val kyuubiHome: String = Utils.getCodeSourceLocation(getClass).split("integration-tests").head
+
+ override protected val conf: KyuubiConf = new KyuubiConf(false)
+
+ protected var miniHdfsService: MiniDFSService = _
+
+ protected var miniYarnService: MiniYarnService = _
+
+ private val yarnConf: YarnConfiguration = {
+ val yarnConfig = new YarnConfiguration()
+
+ // configurations copied from org.apache.flink.yarn.YarnTestBase
+ yarnConfig.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 32)
+ yarnConfig.setInt(YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB, 4096)
+
+ yarnConfig.setBoolean(YarnConfiguration.RM_SCHEDULER_INCLUDE_PORT_IN_NODE_NAME, true)
+ yarnConfig.setInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, 2)
+ yarnConfig.setInt(YarnConfiguration.RM_MAX_COMPLETED_APPLICATIONS, 2)
+ yarnConfig.setInt(YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES, 4)
+ yarnConfig.setInt(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600)
+ yarnConfig.setBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED, false)
+ // memory is overwritten in the MiniYARNCluster.
+ // so we have to change the number of cores for testing.
+ yarnConfig.setInt(YarnConfiguration.NM_VCORES, 666)
+ yarnConfig.setFloat(YarnConfiguration.NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE, 99.0f)
+ yarnConfig.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_MS, 1000)
+ yarnConfig.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 5000)
+
+ // capacity-scheduler.xml is missing in hadoop-client-minicluster so this is a workaround
+ yarnConfig.set("yarn.scheduler.capacity.root.queues", "default,four_cores_queue")
+
+ yarnConfig.setInt("yarn.scheduler.capacity.root.default.capacity", 100)
+ yarnConfig.setFloat("yarn.scheduler.capacity.root.default.user-limit-factor", 1)
+ yarnConfig.setInt("yarn.scheduler.capacity.root.default.maximum-capacity", 100)
+ yarnConfig.set("yarn.scheduler.capacity.root.default.state", "RUNNING")
+ yarnConfig.set("yarn.scheduler.capacity.root.default.acl_submit_applications", "*")
+ yarnConfig.set("yarn.scheduler.capacity.root.default.acl_administer_queue", "*")
+
+ yarnConfig.setInt("yarn.scheduler.capacity.root.four_cores_queue.maximum-capacity", 100)
+ yarnConfig.setInt("yarn.scheduler.capacity.root.four_cores_queue.maximum-applications", 10)
+ yarnConfig.setInt("yarn.scheduler.capacity.root.four_cores_queue.maximum-allocation-vcores", 4)
+ yarnConfig.setFloat("yarn.scheduler.capacity.root.four_cores_queue.user-limit-factor", 1)
+ yarnConfig.set("yarn.scheduler.capacity.root.four_cores_queue.acl_submit_applications", "*")
+ yarnConfig.set("yarn.scheduler.capacity.root.four_cores_queue.acl_administer_queue", "*")
+
+ yarnConfig.setInt("yarn.scheduler.capacity.node-locality-delay", -1)
+ // Set bind host to localhost to avoid java.net.BindException
+ yarnConfig.set(YarnConfiguration.RM_BIND_HOST, "localhost")
+ yarnConfig.set(YarnConfiguration.NM_BIND_HOST, "localhost")
+
+ yarnConfig
+ }
+
+ override def beforeAll(): Unit = {
+ miniHdfsService = new MiniDFSService()
+ miniHdfsService.initialize(conf)
+ miniHdfsService.start()
+
+ val hdfsServiceUrl = s"hdfs://localhost:${miniHdfsService.getDFSPort}"
+ yarnConf.set("fs.defaultFS", hdfsServiceUrl)
+ yarnConf.addResource(miniHdfsService.getHadoopConf)
+
+ val cp = System.getProperty("java.class.path")
+ // exclude kyuubi flink engine jar that has SPI for EmbeddedExecutorFactory
+ // which can't be initialized on the client side
+ val hadoopJars = cp.split(":").filter(s => !s.contains("flink"))
+ val hadoopClasspath = hadoopJars.mkString(":")
+ yarnConf.set("yarn.application.classpath", hadoopClasspath)
+
+ miniYarnService = new MiniYarnService()
+ miniYarnService.setYarnConf(yarnConf)
+ miniYarnService.initialize(conf)
+ miniYarnService.start()
+
+ val hadoopConfDir = Utils.createTempDir().toFile
+ val writer = new FileWriter(new File(hadoopConfDir, "core-site.xml"))
+ yarnConf.writeXml(writer)
+ writer.close()
+
+ val flinkHome = {
+ val candidates = Paths.get(kyuubiHome, "externals", "kyuubi-download", "target")
+ .toFile.listFiles(f => f.getName.contains("flink"))
+ if (candidates == null) None else candidates.map(_.toPath).headOption
+ }
+ if (flinkHome.isEmpty) {
+ throw new IllegalStateException(s"Flink home not found in $kyuubiHome/externals")
+ }
+
+ conf.set(s"$KYUUBI_ENGINE_ENV_PREFIX.KYUUBI_HOME", kyuubiHome)
+ conf.set(s"$KYUUBI_ENGINE_ENV_PREFIX.FLINK_HOME", flinkHome.get.toString)
+ conf.set(
+ s"$KYUUBI_ENGINE_ENV_PREFIX.FLINK_CONF_DIR",
+ s"${flinkHome.get.toString}${File.separator}conf")
+ conf.set(s"$KYUUBI_ENGINE_ENV_PREFIX.HADOOP_CLASSPATH", hadoopClasspath)
+ conf.set(s"$KYUUBI_ENGINE_ENV_PREFIX.HADOOP_CONF_DIR", hadoopConfDir.getAbsolutePath)
+ conf.set(s"flink.containerized.master.env.HADOOP_CLASSPATH", hadoopClasspath)
+ conf.set(s"flink.containerized.master.env.HADOOP_CONF_DIR", hadoopConfDir.getAbsolutePath)
+ conf.set(s"flink.containerized.taskmanager.env.HADOOP_CONF_DIR", hadoopConfDir.getAbsolutePath)
+
+ super.beforeAll()
+ }
+
+ override def afterAll(): Unit = {
+ super.afterAll()
+ if (miniYarnService != null) {
+ miniYarnService.stop()
+ miniYarnService = null
+ }
+ if (miniHdfsService != null) {
+ miniHdfsService.stop()
+ miniHdfsService = null
+ }
+ }
+}
diff --git a/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/operation/FlinkOperationSuite.scala b/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/operation/FlinkOperationSuite.scala
index 893e0020a6a..55476bfd003 100644
--- a/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/operation/FlinkOperationSuite.scala
+++ b/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/operation/FlinkOperationSuite.scala
@@ -31,7 +31,7 @@ class FlinkOperationSuite extends WithKyuubiServerAndFlinkMiniCluster
override val conf: KyuubiConf = KyuubiConf()
.set(s"$KYUUBI_ENGINE_ENV_PREFIX.$KYUUBI_HOME", kyuubiHome)
.set(ENGINE_TYPE, "FLINK_SQL")
- .set("flink.parallelism.default", "6")
+ .set("flink.parallelism.default", "2")
override protected def jdbcUrl: String = getJdbcUrl
@@ -72,7 +72,7 @@ class FlinkOperationSuite extends WithKyuubiServerAndFlinkMiniCluster
var success = false
while (resultSet.next() && !success) {
if (resultSet.getString(1) == "parallelism.default" &&
- resultSet.getString(2) == "6") {
+ resultSet.getString(2) == "2") {
success = true
}
}
diff --git a/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/operation/FlinkOperationSuiteOnYarn.scala b/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/operation/FlinkOperationSuiteOnYarn.scala
new file mode 100644
index 00000000000..ee6b9bb98ea
--- /dev/null
+++ b/integration-tests/kyuubi-flink-it/src/test/scala/org/apache/kyuubi/it/flink/operation/FlinkOperationSuiteOnYarn.scala
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.it.flink.operation
+
+import org.apache.hive.service.rpc.thrift.{TGetInfoReq, TGetInfoType}
+
+import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.KyuubiConf._
+import org.apache.kyuubi.it.flink.WithKyuubiServerAndYarnMiniCluster
+import org.apache.kyuubi.operation.HiveJDBCTestHelper
+import org.apache.kyuubi.operation.meta.ResultSetSchemaConstant.TABLE_CAT
+
+class FlinkOperationSuiteOnYarn extends WithKyuubiServerAndYarnMiniCluster
+ with HiveJDBCTestHelper {
+
+ override protected def jdbcUrl: String = {
+ // delay the access to thrift service because the thrift service
+ // may not be ready although it's registered
+ Thread.sleep(3000L)
+ getJdbcUrl
+ }
+
+ override def beforeAll(): Unit = {
+ conf
+ .set(s"$KYUUBI_ENGINE_ENV_PREFIX.$KYUUBI_HOME", kyuubiHome)
+ .set(ENGINE_TYPE, "FLINK_SQL")
+ .set("flink.execution.target", "yarn-application")
+ .set("flink.parallelism.default", "2")
+ super.beforeAll()
+ }
+
+ test("get catalogs for flink sql") {
+ withJdbcStatement() { statement =>
+ val meta = statement.getConnection.getMetaData
+ val catalogs = meta.getCatalogs
+ val expected = Set("default_catalog").toIterator
+ while (catalogs.next()) {
+ assert(catalogs.getString(TABLE_CAT) === expected.next())
+ }
+ assert(!expected.hasNext)
+ assert(!catalogs.next())
+ }
+ }
+
+ test("execute statement - create/alter/drop table") {
+ withJdbcStatement() { statement =>
+ statement.executeQuery("create table tbl_a (a string) with ('connector' = 'blackhole')")
+ assert(statement.execute("alter table tbl_a rename to tbl_b"))
+ assert(statement.execute("drop table tbl_b"))
+ }
+ }
+
+ test("execute statement - select column name with dots") {
+ withJdbcStatement() { statement =>
+ val resultSet = statement.executeQuery("select 'tmp.hello'")
+ assert(resultSet.next())
+ assert(resultSet.getString(1) === "tmp.hello")
+ }
+ }
+
+ test("set kyuubi conf into flink conf") {
+ withJdbcStatement() { statement =>
+ val resultSet = statement.executeQuery("SET")
+ // Flink does not support set key without value currently,
+ // thus read all rows to find the desired one
+ var success = false
+ while (resultSet.next() && !success) {
+ if (resultSet.getString(1) == "parallelism.default" &&
+ resultSet.getString(2) == "2") {
+ success = true
+ }
+ }
+ assert(success)
+ }
+ }
+
+ test("server info provider - server") {
+ withSessionConf(Map(KyuubiConf.SERVER_INFO_PROVIDER.key -> "SERVER"))()() {
+ withSessionHandle { (client, handle) =>
+ val req = new TGetInfoReq()
+ req.setSessionHandle(handle)
+ req.setInfoType(TGetInfoType.CLI_DBMS_NAME)
+ assert(client.GetInfo(req).getInfoValue.getStringValue === "Apache Kyuubi")
+ }
+ }
+ }
+
+ test("server info provider - engine") {
+ withSessionConf(Map(KyuubiConf.SERVER_INFO_PROVIDER.key -> "ENGINE"))()() {
+ withSessionHandle { (client, handle) =>
+ val req = new TGetInfoReq()
+ req.setSessionHandle(handle)
+ req.setInfoType(TGetInfoType.CLI_DBMS_NAME)
+ assert(client.GetInfo(req).getInfoValue.getStringValue === "Apache Flink")
+ }
+ }
+ }
+}
diff --git a/integration-tests/kyuubi-hive-it/pom.xml b/integration-tests/kyuubi-hive-it/pom.xml
index ff9a6b35ea6..24e5529a2d3 100644
--- a/integration-tests/kyuubi-hive-it/pom.xml
+++ b/integration-tests/kyuubi-hive-it/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubiintegration-tests
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../pom.xml
- kyuubi-hive-it_2.12
+ kyuubi-hive-it_${scala.binary.version}Kyuubi Test Hive IThttps://kyuubi.apache.org/
diff --git a/integration-tests/kyuubi-hive-it/src/test/scala/org/apache/kyuubi/it/hive/operation/KyuubiOperationHiveEnginePerUserSuite.scala b/integration-tests/kyuubi-hive-it/src/test/scala/org/apache/kyuubi/it/hive/operation/KyuubiOperationHiveEnginePerUserSuite.scala
index a4e6bb150b9..07e2bc0f2c7 100644
--- a/integration-tests/kyuubi-hive-it/src/test/scala/org/apache/kyuubi/it/hive/operation/KyuubiOperationHiveEnginePerUserSuite.scala
+++ b/integration-tests/kyuubi-hive-it/src/test/scala/org/apache/kyuubi/it/hive/operation/KyuubiOperationHiveEnginePerUserSuite.scala
@@ -61,4 +61,21 @@ class KyuubiOperationHiveEnginePerUserSuite extends WithKyuubiServer with HiveEn
}
}
}
+
+ test("kyuubi defined function - system_user, session_user") {
+ withJdbcStatement("hive_engine_test") { statement =>
+ val rs = statement.executeQuery("SELECT system_user(), session_user()")
+ assert(rs.next())
+ assert(rs.getString(1) === Utils.currentUser)
+ assert(rs.getString(2) === Utils.currentUser)
+ }
+ }
+
+ test("kyuubi defined function - engine_id") {
+ withJdbcStatement("hive_engine_test") { statement =>
+ val rs = statement.executeQuery("SELECT engine_id()")
+ assert(rs.next())
+ assert(rs.getString(1).nonEmpty)
+ }
+ }
}
diff --git a/integration-tests/kyuubi-jdbc-it/pom.xml b/integration-tests/kyuubi-jdbc-it/pom.xml
index 2d95de78ed8..08f74512e90 100644
--- a/integration-tests/kyuubi-jdbc-it/pom.xml
+++ b/integration-tests/kyuubi-jdbc-it/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubiintegration-tests
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../pom.xml
- kyuubi-jdbc-it_2.12
+ kyuubi-jdbc-it_${scala.binary.version}Kyuubi Test Jdbc IThttps://kyuubi.apache.org/
diff --git a/integration-tests/kyuubi-kubernetes-it/pom.xml b/integration-tests/kyuubi-kubernetes-it/pom.xml
index a796ccab59a..a4334e497c7 100644
--- a/integration-tests/kyuubi-kubernetes-it/pom.xml
+++ b/integration-tests/kyuubi-kubernetes-it/pom.xml
@@ -21,7 +21,7 @@
org.apache.kyuubiintegration-tests
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../pom.xml
diff --git a/integration-tests/kyuubi-kubernetes-it/src/test/scala/org/apache/kyuubi/kubernetes/test/deployment/KyuubiOnKubernetesTestsSuite.scala b/integration-tests/kyuubi-kubernetes-it/src/test/scala/org/apache/kyuubi/kubernetes/test/deployment/KyuubiOnKubernetesTestsSuite.scala
index bc7c98a80c7..73cb5620a51 100644
--- a/integration-tests/kyuubi-kubernetes-it/src/test/scala/org/apache/kyuubi/kubernetes/test/deployment/KyuubiOnKubernetesTestsSuite.scala
+++ b/integration-tests/kyuubi-kubernetes-it/src/test/scala/org/apache/kyuubi/kubernetes/test/deployment/KyuubiOnKubernetesTestsSuite.scala
@@ -54,7 +54,7 @@ class KyuubiOnKubernetesWithSparkTestsBase extends WithKyuubiServerOnKubernetes
super.connectionConf ++
Map(
"spark.master" -> s"k8s://$miniKubeApiMaster",
- "spark.kubernetes.container.image" -> "apache/spark:v3.3.2",
+ "spark.kubernetes.container.image" -> "apache/spark:3.4.1",
"spark.executor.memory" -> "512M",
"spark.driver.memory" -> "1024M",
"spark.kubernetes.driver.request.cores" -> "250m",
diff --git a/integration-tests/kyuubi-kubernetes-it/src/test/scala/org/apache/kyuubi/kubernetes/test/spark/SparkOnKubernetesTestsSuite.scala b/integration-tests/kyuubi-kubernetes-it/src/test/scala/org/apache/kyuubi/kubernetes/test/spark/SparkOnKubernetesTestsSuite.scala
index 5141ff4d7ea..3f591e604dc 100644
--- a/integration-tests/kyuubi-kubernetes-it/src/test/scala/org/apache/kyuubi/kubernetes/test/spark/SparkOnKubernetesTestsSuite.scala
+++ b/integration-tests/kyuubi-kubernetes-it/src/test/scala/org/apache/kyuubi/kubernetes/test/spark/SparkOnKubernetesTestsSuite.scala
@@ -19,7 +19,6 @@ package org.apache.kyuubi.kubernetes.test.spark
import java.util.UUID
-import scala.collection.JavaConverters._
import scala.concurrent.duration._
import org.apache.hadoop.conf.Configuration
@@ -29,7 +28,7 @@ import org.apache.kyuubi._
import org.apache.kyuubi.client.util.BatchUtils._
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.config.KyuubiConf.FRONTEND_THRIFT_BINARY_BIND_HOST
-import org.apache.kyuubi.engine.{ApplicationInfo, ApplicationOperation, KubernetesApplicationOperation}
+import org.apache.kyuubi.engine.{ApplicationInfo, ApplicationManagerInfo, ApplicationOperation, KubernetesApplicationOperation}
import org.apache.kyuubi.engine.ApplicationState.{FAILED, NOT_FOUND, RUNNING}
import org.apache.kyuubi.engine.spark.SparkProcessBuilder
import org.apache.kyuubi.kubernetes.test.MiniKube
@@ -44,11 +43,14 @@ abstract class SparkOnKubernetesSuiteBase
MiniKube.getKubernetesClient.getMasterUrl.toString
}
+ protected val appMgrInfo =
+ ApplicationManagerInfo(Some(s"k8s://$apiServerAddress"), Some("minikube"), None)
+
protected def sparkOnK8sConf: KyuubiConf = {
// TODO Support more Spark version
// Spark official docker image: https://hub.docker.com/r/apache/spark/tags
KyuubiConf().set("spark.master", s"k8s://$apiServerAddress")
- .set("spark.kubernetes.container.image", "apache/spark:v3.3.2")
+ .set("spark.kubernetes.container.image", "apache/spark:3.4.1")
.set("spark.kubernetes.container.image.pullPolicy", "IfNotPresent")
.set("spark.executor.instances", "1")
.set("spark.executor.memory", "512M")
@@ -57,6 +59,7 @@ abstract class SparkOnKubernetesSuiteBase
.set("spark.kubernetes.executor.request.cores", "250m")
.set("kyuubi.kubernetes.context", "minikube")
.set("kyuubi.frontend.protocols", "THRIFT_BINARY,REST")
+ .set("kyuubi.session.engine.initialize.timeout", "PT10M")
}
}
@@ -145,24 +148,31 @@ class KyuubiOperationKubernetesClusterClientModeSuite
"kyuubi",
"passwd",
"localhost",
- batchRequest.getConf.asScala.toMap,
batchRequest)
eventually(timeout(3.minutes), interval(50.milliseconds)) {
- val state = k8sOperation.getApplicationInfoByTag(sessionHandle.identifier.toString)
+ val state = k8sOperation.getApplicationInfoByTag(
+ appMgrInfo,
+ sessionHandle.identifier.toString)
assert(state.id != null)
assert(state.name != null)
assert(state.state == RUNNING)
}
- val killResponse = k8sOperation.killApplicationByTag(sessionHandle.identifier.toString)
+ val killResponse = k8sOperation.killApplicationByTag(
+ appMgrInfo,
+ sessionHandle.identifier.toString)
assert(killResponse._1)
assert(killResponse._2 startsWith "Succeeded to terminate:")
- val appInfo = k8sOperation.getApplicationInfoByTag(sessionHandle.identifier.toString)
+ val appInfo = k8sOperation.getApplicationInfoByTag(
+ appMgrInfo,
+ sessionHandle.identifier.toString)
assert(appInfo == ApplicationInfo(null, null, NOT_FOUND))
- val failKillResponse = k8sOperation.killApplicationByTag(sessionHandle.identifier.toString)
+ val failKillResponse = k8sOperation.killApplicationByTag(
+ appMgrInfo,
+ sessionHandle.identifier.toString)
assert(!failKillResponse._1)
assert(failKillResponse._2 === ApplicationOperation.NOT_FOUND)
}
@@ -205,30 +215,37 @@ class KyuubiOperationKubernetesClusterClusterModeSuite
"runner",
"passwd",
"localhost",
- batchRequest.getConf.asScala.toMap,
batchRequest)
// wait for driver pod start
eventually(timeout(3.minutes), interval(5.second)) {
// trigger k8sOperation init here
- val appInfo = k8sOperation.getApplicationInfoByTag(sessionHandle.identifier.toString)
+ val appInfo = k8sOperation.getApplicationInfoByTag(
+ appMgrInfo,
+ sessionHandle.identifier.toString)
assert(appInfo.state == RUNNING)
assert(appInfo.name.startsWith(driverPodNamePrefix))
}
- val killResponse = k8sOperation.killApplicationByTag(sessionHandle.identifier.toString)
+ val killResponse = k8sOperation.killApplicationByTag(
+ appMgrInfo,
+ sessionHandle.identifier.toString)
assert(killResponse._1)
assert(killResponse._2 endsWith "is completed")
assert(killResponse._2 contains sessionHandle.identifier.toString)
eventually(timeout(3.minutes), interval(50.milliseconds)) {
- val appInfo = k8sOperation.getApplicationInfoByTag(sessionHandle.identifier.toString)
+ val appInfo = k8sOperation.getApplicationInfoByTag(
+ appMgrInfo,
+ sessionHandle.identifier.toString)
// We may kill engine start but not ready
// An EOF Error occurred when the driver was starting
assert(appInfo.state == FAILED || appInfo.state == NOT_FOUND)
}
- val failKillResponse = k8sOperation.killApplicationByTag(sessionHandle.identifier.toString)
+ val failKillResponse = k8sOperation.killApplicationByTag(
+ appMgrInfo,
+ sessionHandle.identifier.toString)
assert(!failKillResponse._1)
}
}
diff --git a/integration-tests/kyuubi-trino-it/pom.xml b/integration-tests/kyuubi-trino-it/pom.xml
index 107d621b075..628f63818b9 100644
--- a/integration-tests/kyuubi-trino-it/pom.xml
+++ b/integration-tests/kyuubi-trino-it/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubiintegration-tests
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../pom.xml
- kyuubi-trino-it_2.12
+ kyuubi-trino-it_${scala.binary.version}Kyuubi Test Trino IThttps://kyuubi.apache.org/
diff --git a/integration-tests/kyuubi-trino-it/src/test/scala/org/apache/kyuubi/it/trino/server/TrinoFrontendSuite.scala b/integration-tests/kyuubi-trino-it/src/test/scala/org/apache/kyuubi/it/trino/server/TrinoFrontendSuite.scala
index 4a175a28b7a..7575bf8a9b4 100644
--- a/integration-tests/kyuubi-trino-it/src/test/scala/org/apache/kyuubi/it/trino/server/TrinoFrontendSuite.scala
+++ b/integration-tests/kyuubi-trino-it/src/test/scala/org/apache/kyuubi/it/trino/server/TrinoFrontendSuite.scala
@@ -73,7 +73,7 @@ class TrinoFrontendSuite extends WithKyuubiServer with SparkMetadataTests {
statement.execute("SELECT 1")
}
} catch {
- case NonFatal(e) =>
+ case NonFatal(_) =>
}
}
}
diff --git a/integration-tests/kyuubi-zookeeper-it/pom.xml b/integration-tests/kyuubi-zookeeper-it/pom.xml
index bded1585b71..869fd40b2bb 100644
--- a/integration-tests/kyuubi-zookeeper-it/pom.xml
+++ b/integration-tests/kyuubi-zookeeper-it/pom.xml
@@ -21,11 +21,11 @@
org.apache.kyuubiintegration-tests
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../pom.xml
- kyuubi-zookeeper-it_2.12
+ kyuubi-zookeeper-it_${scala.binary.version}Kyuubi Test Zookeeper IThttps://kyuubi.apache.org/
diff --git a/integration-tests/pom.xml b/integration-tests/pom.xml
index b6a48daaedc..35d0b4f9ea7 100644
--- a/integration-tests/pom.xml
+++ b/integration-tests/pom.xml
@@ -21,7 +21,7 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOTintegration-tests
diff --git a/kyuubi-assembly/pom.xml b/kyuubi-assembly/pom.xml
index 0524470a20d..4fa0d9a0fd3 100644
--- a/kyuubi-assembly/pom.xml
+++ b/kyuubi-assembly/pom.xml
@@ -22,11 +22,11 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../pom.xml
- kyuubi-assembly_2.12
+ kyuubi-assembly_${scala.binary.version}pomKyuubi Project Assemblyhttps://kyuubi.apache.org/
@@ -69,28 +69,18 @@
- org.apache.hadoop
- hadoop-client-api
+ org.apache.kyuubi
+ ${kyuubi-shaded-zookeeper.artifacts}org.apache.hadoop
- hadoop-client-runtime
-
-
-
- org.apache.curator
- curator-framework
-
-
-
- org.apache.curator
- curator-client
+ hadoop-client-api
- org.apache.curator
- curator-recipes
+ org.apache.hadoop
+ hadoop-client-runtime
diff --git a/kyuubi-common/pom.xml b/kyuubi-common/pom.xml
index d62761d72b3..0d5c491b51c 100644
--- a/kyuubi-common/pom.xml
+++ b/kyuubi-common/pom.xml
@@ -21,20 +21,20 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT../pom.xml
- kyuubi-common_2.12
+ kyuubi-common_${scala.binary.version}jarKyuubi Project Commonhttps://kyuubi.apache.org/
- com.vladsch.flexmark
- flexmark-all
- test
+ org.apache.kyuubi
+ kyuubi-util-scala_${scala.binary.version}
+ ${project.version}
@@ -128,6 +128,13 @@
HikariCP
+
+ org.apache.kyuubi
+ kyuubi-util-scala_${scala.binary.version}
+ ${project.version}
+ test-jar
+
+
org.apache.hadoophadoop-minikdc
@@ -148,7 +155,7 @@
org.scalatestplus
- mockito-4-6_${scala.binary.version}
+ mockito-4-11_${scala.binary.version}test
@@ -164,11 +171,23 @@
test
+
+ org.xerial
+ sqlite-jdbc
+ test
+
+
com.jakewharton.fliptablesfliptablestest
+
+
+ com.vladsch.flexmark
+ flexmark-all
+ test
+
diff --git a/kyuubi-common/src/main/scala/org/apache/kyuubi/KyuubiSQLException.scala b/kyuubi-common/src/main/scala/org/apache/kyuubi/KyuubiSQLException.scala
index a9e486fb2b6..570ee6d3873 100644
--- a/kyuubi-common/src/main/scala/org/apache/kyuubi/KyuubiSQLException.scala
+++ b/kyuubi-common/src/main/scala/org/apache/kyuubi/KyuubiSQLException.scala
@@ -26,6 +26,7 @@ import scala.collection.JavaConverters._
import org.apache.hive.service.rpc.thrift.{TStatus, TStatusCode}
import org.apache.kyuubi.Utils.stringifyException
+import org.apache.kyuubi.util.reflect.DynConstructors
/**
* @param reason a description of the exception
@@ -139,9 +140,10 @@ object KyuubiSQLException {
}
private def newInstance(className: String, message: String, cause: Throwable): Throwable = {
try {
- Class.forName(className)
- .getConstructor(classOf[String], classOf[Throwable])
- .newInstance(message, cause).asInstanceOf[Throwable]
+ DynConstructors.builder()
+ .impl(className, classOf[String], classOf[Throwable])
+ .buildChecked[Throwable]()
+ .newInstance(message, cause)
} catch {
case _: Exception => new RuntimeException(className + ":" + message, cause)
}
@@ -154,7 +156,7 @@ object KyuubiSQLException {
(i1, i2, i3)
}
- def toCause(details: Seq[String]): Throwable = {
+ def toCause(details: Iterable[String]): Throwable = {
var ex: Throwable = null
if (details != null && details.nonEmpty) {
val head = details.head
@@ -170,7 +172,7 @@ object KyuubiSQLException {
val lineNum = line.substring(i3 + 1).toInt
new StackTraceElement(clzName, methodName, fileName, lineNum)
}
- ex = newInstance(exClz, msg, toCause(details.slice(length + 2, details.length)))
+ ex = newInstance(exClz, msg, toCause(details.slice(length + 2, details.size)))
ex.setStackTrace(stackTraceElements.toArray)
}
ex
diff --git a/kyuubi-common/src/main/scala/org/apache/kyuubi/Logging.scala b/kyuubi-common/src/main/scala/org/apache/kyuubi/Logging.scala
index 1df598132fb..d6dcc8d345a 100644
--- a/kyuubi-common/src/main/scala/org/apache/kyuubi/Logging.scala
+++ b/kyuubi-common/src/main/scala/org/apache/kyuubi/Logging.scala
@@ -23,7 +23,7 @@ import org.apache.logging.log4j.core.config.DefaultConfiguration
import org.slf4j.{Logger, LoggerFactory}
import org.slf4j.bridge.SLF4JBridgeHandler
-import org.apache.kyuubi.util.ClassUtils
+import org.apache.kyuubi.util.reflect.ReflectUtils
/**
* Simple version of logging adopted from Apache Spark.
@@ -116,8 +116,9 @@ object Logging {
// This distinguishes the log4j 1.2 binding, currently
// org.slf4j.impl.Log4jLoggerFactory, from the log4j 2.0 binding, currently
// org.apache.logging.slf4j.Log4jLoggerFactory
- "org.slf4j.impl.Log4jLoggerFactory"
- .equals(LoggerFactory.getILoggerFactory.getClass.getName)
+ val binderClass = LoggerFactory.getILoggerFactory.getClass.getName
+ "org.slf4j.impl.Log4jLoggerFactory".equals(
+ binderClass) || "org.slf4j.impl.Reload4jLoggerFactory".equals(binderClass)
}
private[kyuubi] def isLog4j2: Boolean = {
@@ -148,7 +149,7 @@ object Logging {
isInterpreter: Boolean,
loggerName: String,
logger: => Logger): Unit = {
- if (ClassUtils.classIsLoadable("org.slf4j.bridge.SLF4JBridgeHandler")) {
+ if (ReflectUtils.isClassLoadable("org.slf4j.bridge.SLF4JBridgeHandler")) {
// Handles configuring the JUL -> SLF4J bridge
SLF4JBridgeHandler.removeHandlersForRootLogger()
SLF4JBridgeHandler.install()
diff --git a/kyuubi-common/src/main/scala/org/apache/kyuubi/Utils.scala b/kyuubi-common/src/main/scala/org/apache/kyuubi/Utils.scala
index 3a03682ff1b..accfca4c98f 100644
--- a/kyuubi-common/src/main/scala/org/apache/kyuubi/Utils.scala
+++ b/kyuubi-common/src/main/scala/org/apache/kyuubi/Utils.scala
@@ -21,9 +21,12 @@ import java.io._
import java.net.{Inet4Address, InetAddress, NetworkInterface}
import java.nio.charset.StandardCharsets
import java.nio.file.{Files, Path, Paths, StandardCopyOption}
+import java.security.PrivilegedAction
import java.text.SimpleDateFormat
import java.util.{Date, Properties, TimeZone, UUID}
+import java.util.concurrent.TimeUnit
import java.util.concurrent.atomic.AtomicLong
+import java.util.concurrent.locks.Lock
import scala.collection.JavaConverters._
import scala.sys.process._
@@ -201,6 +204,14 @@ object Utils extends Logging {
def currentUser: String = UserGroupInformation.getCurrentUser.getShortUserName
+ def doAs[T](
+ proxyUser: String,
+ realUser: UserGroupInformation = UserGroupInformation.getCurrentUser)(f: () => T): T = {
+ UserGroupInformation.createProxyUser(proxyUser, realUser).doAs(new PrivilegedAction[T] {
+ override def run(): T = f()
+ })
+ }
+
private val shortVersionRegex = """^(\d+\.\d+\.\d+)(.*)?$""".r
/**
@@ -407,4 +418,35 @@ object Utils extends Logging {
stringWriter.toString
}
}
+
+ def withLockRequired[T](lock: Lock)(block: => T): T = {
+ try {
+ lock.lock()
+ block
+ } finally {
+ lock.unlock()
+ }
+ }
+
+ /**
+ * Try killing the process gracefully first, then forcibly if process does not exit in
+ * graceful period.
+ *
+ * @param process the being killed process
+ * @param gracefulPeriod the graceful killing period, in milliseconds
+ * @return the exit code if process exit normally, None if the process finally was killed
+ * forcibly
+ */
+ def terminateProcess(process: java.lang.Process, gracefulPeriod: Long): Option[Int] = {
+ process.destroy()
+ if (process.waitFor(gracefulPeriod, TimeUnit.MILLISECONDS)) {
+ Some(process.exitValue())
+ } else {
+ warn(s"Process does not exit after $gracefulPeriod ms, try to forcibly kill. " +
+ "Staging files generated by the process may be retained!")
+ process.destroyForcibly()
+ None
+ }
+ }
+
}
diff --git a/kyuubi-common/src/main/scala/org/apache/kyuubi/config/ConfigBuilder.scala b/kyuubi-common/src/main/scala/org/apache/kyuubi/config/ConfigBuilder.scala
index 62f060a052d..d6de402416d 100644
--- a/kyuubi-common/src/main/scala/org/apache/kyuubi/config/ConfigBuilder.scala
+++ b/kyuubi-common/src/main/scala/org/apache/kyuubi/config/ConfigBuilder.scala
@@ -18,11 +18,14 @@
package org.apache.kyuubi.config
import java.time.Duration
+import java.util.Locale
import java.util.regex.PatternSyntaxException
import scala.util.{Failure, Success, Try}
import scala.util.matching.Regex
+import org.apache.kyuubi.util.EnumUtils._
+
private[kyuubi] case class ConfigBuilder(key: String) {
private[config] var _doc = ""
@@ -150,7 +153,7 @@ private[kyuubi] case class ConfigBuilder(key: String) {
}
}
- new TypedConfigBuilder(this, regexFromString(_, this.key), _.toString)
+ TypedConfigBuilder(this, regexFromString(_, this.key), _.toString)
}
}
@@ -166,6 +169,21 @@ private[kyuubi] case class TypedConfigBuilder[T](
def transform(fn: T => T): TypedConfigBuilder[T] = this.copy(fromStr = s => fn(fromStr(s)))
+ def transformToUpperCase: TypedConfigBuilder[T] = {
+ transformString(_.toUpperCase(Locale.ROOT))
+ }
+
+ def transformToLowerCase: TypedConfigBuilder[T] = {
+ transformString(_.toLowerCase(Locale.ROOT))
+ }
+
+ private def transformString(fn: String => String): TypedConfigBuilder[T] = {
+ require(parent._type == "string")
+ this.asInstanceOf[TypedConfigBuilder[String]]
+ .transform(fn)
+ .asInstanceOf[TypedConfigBuilder[T]]
+ }
+
/** Checks if the user-provided value for the config matches the validator. */
def checkValue(validator: T => Boolean, errMsg: String): TypedConfigBuilder[T] = {
transform { v =>
@@ -187,10 +205,35 @@ private[kyuubi] case class TypedConfigBuilder[T](
}
}
+ /** Checks if the user-provided value for the config matches the value set of the enumeration. */
+ def checkValues(enumeration: Enumeration): TypedConfigBuilder[T] = {
+ transform { v =>
+ val isValid = v match {
+ case iter: Iterable[Any] => isValidEnums(enumeration, iter)
+ case name => isValidEnum(enumeration, name)
+ }
+ if (!isValid) {
+ val actualValueStr = v match {
+ case iter: Iterable[Any] => iter.mkString(",")
+ case value => value.toString
+ }
+ throw new IllegalArgumentException(
+ s"The value of ${parent.key} should be one of ${enumeration.values.mkString(", ")}," +
+ s" but was $actualValueStr")
+ }
+ v
+ }
+ }
+
/** Turns the config entry into a sequence of values of the underlying type. */
def toSequence(sp: String = ","): TypedConfigBuilder[Seq[T]] = {
parent._type = "seq"
- TypedConfigBuilder(parent, strToSeq(_, fromStr, sp), seqToStr(_, toStr))
+ TypedConfigBuilder(parent, strToSeq(_, fromStr, sp), iterableToStr(_, toStr))
+ }
+
+ def toSet(sp: String = ",", skipBlank: Boolean = true): TypedConfigBuilder[Set[T]] = {
+ parent._type = "set"
+ TypedConfigBuilder(parent, strToSet(_, fromStr, sp, skipBlank), iterableToStr(_, toStr))
}
def createOptional: OptionalConfigEntry[T] = {
diff --git a/kyuubi-common/src/main/scala/org/apache/kyuubi/config/ConfigHelpers.scala b/kyuubi-common/src/main/scala/org/apache/kyuubi/config/ConfigHelpers.scala
index 225f1b53726..525ea2ff4af 100644
--- a/kyuubi-common/src/main/scala/org/apache/kyuubi/config/ConfigHelpers.scala
+++ b/kyuubi-common/src/main/scala/org/apache/kyuubi/config/ConfigHelpers.scala
@@ -17,6 +17,8 @@
package org.apache.kyuubi.config
+import org.apache.commons.lang3.StringUtils
+
import org.apache.kyuubi.Utils
object ConfigHelpers {
@@ -25,7 +27,11 @@ object ConfigHelpers {
Utils.strToSeq(str, sp).map(converter)
}
- def seqToStr[T](v: Seq[T], stringConverter: T => String): String = {
- v.map(stringConverter).mkString(",")
+ def strToSet[T](str: String, converter: String => T, sp: String, skipBlank: Boolean): Set[T] = {
+ Utils.strToSeq(str, sp).filter(!skipBlank || StringUtils.isNotBlank(_)).map(converter).toSet
+ }
+
+ def iterableToStr[T](v: Iterable[T], stringConverter: T => String, sp: String = ","): String = {
+ v.map(stringConverter).mkString(sp)
}
}
diff --git a/kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala b/kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
index b5229e2ad4f..50006b95ea1 100644
--- a/kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
+++ b/kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
@@ -42,7 +42,7 @@ case class KyuubiConf(loadSysDefault: Boolean = true) extends Logging {
}
if (loadSysDefault) {
- val fromSysDefaults = Utils.getSystemProperties.filterKeys(_.startsWith("kyuubi."))
+ val fromSysDefaults = Utils.getSystemProperties.filterKeys(_.startsWith("kyuubi.")).toMap
loadFromMap(fromSysDefaults)
}
@@ -103,7 +103,6 @@ case class KyuubiConf(loadSysDefault: Boolean = true) extends Logging {
/** unset a parameter from the configuration */
def unset(key: String): KyuubiConf = {
- logDeprecationWarning(key)
settings.remove(key)
this
}
@@ -135,6 +134,31 @@ case class KyuubiConf(loadSysDefault: Boolean = true) extends Logging {
getAllWithPrefix(s"$KYUUBI_BATCH_CONF_PREFIX.$normalizedBatchType", "")
}
+ /** Get the kubernetes conf for specified kubernetes context and namespace. */
+ def getKubernetesConf(context: Option[String], namespace: Option[String]): KyuubiConf = {
+ val conf = this.clone
+ context.foreach { c =>
+ val contextConf =
+ getAllWithPrefix(s"$KYUUBI_KUBERNETES_CONF_PREFIX.$c", "").map { case (suffix, value) =>
+ s"$KYUUBI_KUBERNETES_CONF_PREFIX.$suffix" -> value
+ }
+ val contextNamespaceConf = namespace.map { ns =>
+ getAllWithPrefix(s"$KYUUBI_KUBERNETES_CONF_PREFIX.$c.$ns", "").map {
+ case (suffix, value) =>
+ s"$KYUUBI_KUBERNETES_CONF_PREFIX.$suffix" -> value
+ }
+ }.getOrElse(Map.empty)
+
+ (contextConf ++ contextNamespaceConf).map { case (key, value) =>
+ conf.set(key, value)
+ }
+ conf.set(KUBERNETES_CONTEXT, c)
+ namespace.foreach(ns => conf.set(KUBERNETES_NAMESPACE, ns))
+ conf
+ }
+ conf
+ }
+
/**
* Retrieve key-value pairs from [[KyuubiConf]] starting with `dropped.remainder`, and put them to
* the result map with the `dropped` of key being dropped.
@@ -189,6 +213,8 @@ case class KyuubiConf(loadSysDefault: Boolean = true) extends Logging {
s"and may be removed in the future. $comment")
}
}
+
+ def isRESTEnabled: Boolean = get(FRONTEND_PROTOCOLS).contains(FrontendProtocols.REST.toString)
}
/**
@@ -206,6 +232,7 @@ object KyuubiConf {
final val KYUUBI_HOME = "KYUUBI_HOME"
final val KYUUBI_ENGINE_ENV_PREFIX = "kyuubi.engineEnv"
final val KYUUBI_BATCH_CONF_PREFIX = "kyuubi.batchConf"
+ final val KYUUBI_KUBERNETES_CONF_PREFIX = "kyuubi.kubernetes"
final val USER_DEFAULTS_CONF_QUOTE = "___"
private[this] val kyuubiConfEntriesUpdateLock = new Object
@@ -386,11 +413,9 @@ object KyuubiConf {
"")
.version("1.4.0")
.stringConf
+ .transformToUpperCase
.toSequence()
- .transform(_.map(_.toUpperCase(Locale.ROOT)))
- .checkValue(
- _.forall(FrontendProtocols.values.map(_.toString).contains),
- s"the frontend protocol should be one or more of ${FrontendProtocols.values.mkString(",")}")
+ .checkValues(FrontendProtocols)
.createWithDefault(Seq(
FrontendProtocols.THRIFT_BINARY.toString,
FrontendProtocols.REST.toString))
@@ -402,6 +427,16 @@ object KyuubiConf {
.stringConf
.createOptional
+ val FRONTEND_ADVERTISED_HOST: OptionalConfigEntry[String] =
+ buildConf("kyuubi.frontend.advertised.host")
+ .doc("Hostname or IP of the Kyuubi server's frontend services to publish to " +
+ "external systems such as the service discovery ensemble and metadata store. " +
+ "Use it when you want to advertise a different hostname or IP than the bind host.")
+ .version("1.8.0")
+ .serverOnly
+ .stringConf
+ .createOptional
+
val FRONTEND_THRIFT_BINARY_BIND_HOST: ConfigEntry[Option[String]] =
buildConf("kyuubi.frontend.thrift.binary.bind.host")
.doc("Hostname or IP of the machine on which to run the thrift frontend service " +
@@ -446,13 +481,13 @@ object KyuubiConf {
.stringConf
.createOptional
- val FRONTEND_THRIFT_BINARY_SSL_DISALLOWED_PROTOCOLS: ConfigEntry[Seq[String]] =
+ val FRONTEND_THRIFT_BINARY_SSL_DISALLOWED_PROTOCOLS: ConfigEntry[Set[String]] =
buildConf("kyuubi.frontend.thrift.binary.ssl.disallowed.protocols")
.doc("SSL versions to disallow for Kyuubi thrift binary frontend.")
.version("1.7.0")
.stringConf
- .toSequence()
- .createWithDefault(Seq("SSLv2", "SSLv3"))
+ .toSet()
+ .createWithDefault(Set("SSLv2", "SSLv3"))
val FRONTEND_THRIFT_BINARY_SSL_INCLUDE_CIPHER_SUITES: ConfigEntry[Seq[String]] =
buildConf("kyuubi.frontend.thrift.binary.ssl.include.ciphersuites")
@@ -728,7 +763,7 @@ object KyuubiConf {
.stringConf
.createWithDefault("X-Real-IP")
- val AUTHENTICATION_METHOD: ConfigEntry[Seq[String]] = buildConf("kyuubi.authentication")
+ val AUTHENTICATION_METHOD: ConfigEntry[Set[String]] = buildConf("kyuubi.authentication")
.doc("A comma-separated list of client authentication types." +
"
" +
"
NOSASL: raw transport.
" +
@@ -763,12 +798,10 @@ object KyuubiConf {
.version("1.0.0")
.serverOnly
.stringConf
- .toSequence()
- .transform(_.map(_.toUpperCase(Locale.ROOT)))
- .checkValue(
- _.forall(AuthTypes.values.map(_.toString).contains),
- s"the authentication type should be one or more of ${AuthTypes.values.mkString(",")}")
- .createWithDefault(Seq(AuthTypes.NONE.toString))
+ .transformToUpperCase
+ .toSet()
+ .checkValues(AuthTypes)
+ .createWithDefault(Set(AuthTypes.NONE.toString))
val AUTHENTICATION_CUSTOM_CLASS: OptionalConfigEntry[String] =
buildConf("kyuubi.authentication.custom.class")
@@ -824,25 +857,25 @@ object KyuubiConf {
.stringConf
.createOptional
- val AUTHENTICATION_LDAP_GROUP_FILTER: ConfigEntry[Seq[String]] =
+ val AUTHENTICATION_LDAP_GROUP_FILTER: ConfigEntry[Set[String]] =
buildConf("kyuubi.authentication.ldap.groupFilter")
.doc("COMMA-separated list of LDAP Group names (short name not full DNs). " +
"For example: HiveAdmins,HadoopAdmins,Administrators")
.version("1.7.0")
.serverOnly
.stringConf
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
- val AUTHENTICATION_LDAP_USER_FILTER: ConfigEntry[Seq[String]] =
+ val AUTHENTICATION_LDAP_USER_FILTER: ConfigEntry[Set[String]] =
buildConf("kyuubi.authentication.ldap.userFilter")
.doc("COMMA-separated list of LDAP usernames (just short names, not full DNs). " +
"For example: hiveuser,impalauser,hiveadmin,hadoopadmin")
.version("1.7.0")
.serverOnly
.stringConf
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
val AUTHENTICATION_LDAP_GUID_KEY: ConfigEntry[String] =
buildConf("kyuubi.authentication.ldap.guidKey")
@@ -999,8 +1032,8 @@ object KyuubiConf {
.version("1.0.0")
.serverOnly
.stringConf
- .checkValues(SaslQOP.values.map(_.toString))
- .transform(_.toLowerCase(Locale.ROOT))
+ .checkValues(SaslQOP)
+ .transformToLowerCase
.createWithDefault(SaslQOP.AUTH.toString)
val FRONTEND_REST_BIND_HOST: ConfigEntry[Option[String]] =
@@ -1105,6 +1138,15 @@ object KyuubiConf {
.stringConf
.createOptional
+ val KUBERNETES_CONTEXT_ALLOW_LIST: ConfigEntry[Set[String]] =
+ buildConf("kyuubi.kubernetes.context.allow.list")
+ .doc("The allowed kubernetes context list, if it is empty," +
+ " there is no kubernetes context limitation.")
+ .version("1.8.0")
+ .stringConf
+ .toSet()
+ .createWithDefault(Set.empty)
+
val KUBERNETES_NAMESPACE: ConfigEntry[String] =
buildConf("kyuubi.kubernetes.namespace")
.doc("The namespace that will be used for running the kyuubi pods and find engines.")
@@ -1112,6 +1154,15 @@ object KyuubiConf {
.stringConf
.createWithDefault("default")
+ val KUBERNETES_NAMESPACE_ALLOW_LIST: ConfigEntry[Set[String]] =
+ buildConf("kyuubi.kubernetes.namespace.allow.list")
+ .doc("The allowed kubernetes namespace list, if it is empty," +
+ " there is no kubernetes namespace limitation.")
+ .version("1.8.0")
+ .stringConf
+ .toSet()
+ .createWithDefault(Set.empty)
+
val KUBERNETES_MASTER: OptionalConfigEntry[String] =
buildConf("kyuubi.kubernetes.master.address")
.doc("The internal Kubernetes master (API server) address to be used for kyuubi.")
@@ -1237,6 +1288,16 @@ object KyuubiConf {
.timeConf
.createWithDefault(0)
+ val ENGINE_SPARK_MAX_INITIAL_WAIT: ConfigEntry[Long] =
+ buildConf("kyuubi.session.engine.spark.max.initial.wait")
+ .doc("Max wait time for the initial connection to Spark engine. The engine will" +
+ " self-terminate no new incoming connection is established within this time." +
+ " This setting only applies at the CONNECTION share level." +
+ " 0 or negative means not to self-terminate.")
+ .version("1.8.0")
+ .timeConf
+ .createWithDefault(Duration.ofSeconds(60).toMillis)
+
val ENGINE_FLINK_MAIN_RESOURCE: OptionalConfigEntry[String] =
buildConf("kyuubi.session.engine.flink.main.resource")
.doc("The package used to create Flink SQL engine remote job. If it is undefined," +
@@ -1254,6 +1315,15 @@ object KyuubiConf {
.intConf
.createWithDefault(1000000)
+ val ENGINE_FLINK_FETCH_TIMEOUT: OptionalConfigEntry[Long] =
+ buildConf("kyuubi.session.engine.flink.fetch.timeout")
+ .doc("Result fetch timeout for Flink engine. If the timeout is reached, the result " +
+ "fetch would be stopped and the current fetched would be returned. If no data are " +
+ "fetched, a TimeoutException would be thrown.")
+ .version("1.8.0")
+ .timeConf
+ .createOptional
+
val ENGINE_TRINO_MAIN_RESOURCE: OptionalConfigEntry[String] =
buildConf("kyuubi.session.engine.trino.main.resource")
.doc("The package used to create Trino engine remote job. If it is undefined," +
@@ -1276,6 +1346,55 @@ object KyuubiConf {
.stringConf
.createOptional
+ val ENGINE_TRINO_CONNECTION_PASSWORD: OptionalConfigEntry[String] =
+ buildConf("kyuubi.engine.trino.connection.password")
+ .doc("The password used for connecting to trino cluster")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
+ val ENGINE_TRINO_CONNECTION_KEYSTORE_PATH: OptionalConfigEntry[String] =
+ buildConf("kyuubi.engine.trino.connection.keystore.path")
+ .doc("The keystore path used for connecting to trino cluster")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
+ val ENGINE_TRINO_CONNECTION_KEYSTORE_PASSWORD: OptionalConfigEntry[String] =
+ buildConf("kyuubi.engine.trino.connection.keystore.password")
+ .doc("The keystore password used for connecting to trino cluster")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
+ val ENGINE_TRINO_CONNECTION_KEYSTORE_TYPE: OptionalConfigEntry[String] =
+ buildConf("kyuubi.engine.trino.connection.keystore.type")
+ .doc("The keystore type used for connecting to trino cluster")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
+ val ENGINE_TRINO_CONNECTION_TRUSTSTORE_PATH: OptionalConfigEntry[String] =
+ buildConf("kyuubi.engine.trino.connection.truststore.path")
+ .doc("The truststore path used for connecting to trino cluster")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
+ val ENGINE_TRINO_CONNECTION_TRUSTSTORE_PASSWORD: OptionalConfigEntry[String] =
+ buildConf("kyuubi.engine.trino.connection.truststore.password")
+ .doc("The truststore password used for connecting to trino cluster")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
+ val ENGINE_TRINO_CONNECTION_TRUSTSTORE_TYPE: OptionalConfigEntry[String] =
+ buildConf("kyuubi.engine.trino.connection.truststore.type")
+ .doc("The truststore type used for connecting to trino cluster")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
val ENGINE_TRINO_SHOW_PROGRESS: ConfigEntry[Boolean] =
buildConf("kyuubi.session.engine.trino.showProgress")
.doc("When true, show the progress bar and final info in the Trino engine log.")
@@ -1304,6 +1423,14 @@ object KyuubiConf {
.timeConf
.createWithDefault(Duration.ofSeconds(15).toMillis)
+ val ENGINE_ALIVE_MAX_FAILURES: ConfigEntry[Int] =
+ buildConf("kyuubi.session.engine.alive.max.failures")
+ .doc("The maximum number of failures allowed for the engine.")
+ .version("1.8.0")
+ .intConf
+ .checkValue(_ > 0, "Must be positive")
+ .createWithDefault(3)
+
val ENGINE_ALIVE_PROBE_ENABLED: ConfigEntry[Boolean] =
buildConf("kyuubi.session.engine.alive.probe.enabled")
.doc("Whether to enable the engine alive probe, it true, we will create a companion thrift" +
@@ -1394,7 +1521,7 @@ object KyuubiConf {
.timeConf
.createWithDefault(Duration.ofMinutes(30L).toMillis)
- val SESSION_CONF_IGNORE_LIST: ConfigEntry[Seq[String]] =
+ val SESSION_CONF_IGNORE_LIST: ConfigEntry[Set[String]] =
buildConf("kyuubi.session.conf.ignore.list")
.doc("A comma-separated list of ignored keys. If the client connection contains any of" +
" them, the key and the corresponding value will be removed silently during engine" +
@@ -1404,10 +1531,10 @@ object KyuubiConf {
" configurations via SET syntax.")
.version("1.2.0")
.stringConf
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
- val SESSION_CONF_RESTRICT_LIST: ConfigEntry[Seq[String]] =
+ val SESSION_CONF_RESTRICT_LIST: ConfigEntry[Set[String]] =
buildConf("kyuubi.session.conf.restrict.list")
.doc("A comma-separated list of restricted keys. If the client connection contains any of" +
" them, the connection will be rejected explicitly during engine bootstrap and connection" +
@@ -1417,8 +1544,8 @@ object KyuubiConf {
" configurations via SET syntax.")
.version("1.2.0")
.stringConf
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
val SESSION_USER_SIGN_ENABLED: ConfigEntry[Boolean] =
buildConf("kyuubi.session.user.sign.enabled")
@@ -1448,6 +1575,15 @@ object KyuubiConf {
.booleanConf
.createWithDefault(true)
+ val SESSION_ENGINE_STARTUP_DESTROY_TIMEOUT: ConfigEntry[Long] =
+ buildConf("kyuubi.session.engine.startup.destroy.timeout")
+ .doc("Engine startup process destroy wait time, if the process does not " +
+ "stop after this time, force destroy instead. This configuration only " +
+ s"takes effect when `${SESSION_ENGINE_STARTUP_WAIT_COMPLETION.key}=false`.")
+ .version("1.8.0")
+ .timeConf
+ .createWithDefault(Duration.ofSeconds(5).toMillis)
+
val SESSION_ENGINE_LAUNCH_ASYNC: ConfigEntry[Boolean] =
buildConf("kyuubi.session.engine.launch.async")
.doc("When opening kyuubi session, whether to launch the backend engine asynchronously." +
@@ -1457,7 +1593,7 @@ object KyuubiConf {
.booleanConf
.createWithDefault(true)
- val SESSION_LOCAL_DIR_ALLOW_LIST: ConfigEntry[Seq[String]] =
+ val SESSION_LOCAL_DIR_ALLOW_LIST: ConfigEntry[Set[String]] =
buildConf("kyuubi.session.local.dir.allow.list")
.doc("The local dir list that are allowed to access by the kyuubi session application. " +
" End-users might set some parameters such as `spark.files` and it will " +
@@ -1470,8 +1606,8 @@ object KyuubiConf {
.stringConf
.checkValue(dir => dir.startsWith(File.separator), "the dir should be absolute path")
.transform(dir => dir.stripSuffix(File.separator) + File.separator)
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
val BATCH_APPLICATION_CHECK_INTERVAL: ConfigEntry[Long] =
buildConf("kyuubi.batch.application.check.interval")
@@ -1487,7 +1623,7 @@ object KyuubiConf {
.timeConf
.createWithDefault(Duration.ofMinutes(3).toMillis)
- val BATCH_CONF_IGNORE_LIST: ConfigEntry[Seq[String]] =
+ val BATCH_CONF_IGNORE_LIST: ConfigEntry[Set[String]] =
buildConf("kyuubi.batch.conf.ignore.list")
.doc("A comma-separated list of ignored keys for batch conf. If the batch conf contains" +
" any of them, the key and the corresponding value will be removed silently during batch" +
@@ -1499,8 +1635,8 @@ object KyuubiConf {
" for the Spark batch job with key `kyuubi.batchConf.spark.spark.master`.")
.version("1.6.0")
.stringConf
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
val BATCH_INTERNAL_REST_CLIENT_SOCKET_TIMEOUT: ConfigEntry[Long] =
buildConf("kyuubi.batch.internal.rest.client.socket.timeout")
@@ -1538,6 +1674,42 @@ object KyuubiConf {
.booleanConf
.createWithDefault(true)
+ val BATCH_SUBMITTER_ENABLED: ConfigEntry[Boolean] =
+ buildConf("kyuubi.batch.submitter.enabled")
+ .internal
+ .serverOnly
+ .doc("Batch API v2 requires batch submitter to pick the INITIALIZED batch job " +
+ "from metastore and submits it to Resource Manager. " +
+ "Note: Batch API v2 is experimental and under rapid development, this configuration " +
+ "is added to allow explorers conveniently testing the developing Batch v2 API, not " +
+ "intended exposing to end users, it may be removed in anytime.")
+ .version("1.8.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val BATCH_SUBMITTER_THREADS: ConfigEntry[Int] =
+ buildConf("kyuubi.batch.submitter.threads")
+ .internal
+ .serverOnly
+ .doc("Number of threads in batch job submitter, this configuration only take effects " +
+ s"when ${BATCH_SUBMITTER_ENABLED.key} is enabled")
+ .version("1.8.0")
+ .intConf
+ .createWithDefault(16)
+
+ val BATCH_IMPL_VERSION: ConfigEntry[String] =
+ buildConf("kyuubi.batch.impl.version")
+ .internal
+ .serverOnly
+ .doc("Batch API version, candidates: 1, 2. Only take effect when " +
+ s"${BATCH_SUBMITTER_ENABLED.key} is true, otherwise always use v1 implementation. " +
+ "Note: Batch API v2 is experimental and under rapid development, this configuration " +
+ "is added to allow explorers conveniently testing the developing Batch v2 API, not " +
+ "intended exposing to end users, it may be removed in anytime.")
+ .version("1.8.0")
+ .stringConf
+ .createWithDefault("1")
+
val SERVER_EXEC_POOL_SIZE: ConfigEntry[Int] =
buildConf("kyuubi.backend.server.exec.pool.size")
.doc("Number of threads in the operation execution thread pool of Kyuubi server")
@@ -1732,7 +1904,7 @@ object KyuubiConf {
.version("1.7.0")
.stringConf
.checkValues(Set("arrow", "thrift"))
- .transform(_.toLowerCase(Locale.ROOT))
+ .transformToLowerCase
.createWithDefault("thrift")
val ARROW_BASED_ROWSET_TIMESTAMP_AS_STRING: ConfigEntry[Boolean] =
@@ -1757,8 +1929,8 @@ object KyuubiConf {
.doc(s"(deprecated) - Using kyuubi.engine.share.level instead")
.version("1.0.0")
.stringConf
- .transform(_.toUpperCase(Locale.ROOT))
- .checkValues(ShareLevel.values.map(_.toString))
+ .transformToUpperCase
+ .checkValues(ShareLevel)
.createWithDefault(ShareLevel.USER.toString)
// [ZooKeeper Data Model]
@@ -1772,7 +1944,7 @@ object KyuubiConf {
.doc("(deprecated) - Using kyuubi.engine.share.level.subdomain instead")
.version("1.2.0")
.stringConf
- .transform(_.toLowerCase(Locale.ROOT))
+ .transformToLowerCase
.checkValue(validZookeeperSubPath.matcher(_).matches(), "must be valid zookeeper sub path.")
.createOptional
@@ -1838,8 +2010,8 @@ object KyuubiConf {
"
")
.version("1.4.0")
.stringConf
- .transform(_.toUpperCase(Locale.ROOT))
- .checkValues(EngineType.values.map(_.toString))
+ .transformToUpperCase
+ .checkValues(EngineType)
.createWithDefault(EngineType.SPARK_SQL.toString)
val ENGINE_POOL_IGNORE_SUBDOMAIN: ConfigEntry[Boolean] =
@@ -1862,6 +2034,7 @@ object KyuubiConf {
.doc("This parameter is introduced as a server-side parameter " +
"controlling the upper limit of the engine pool.")
.version("1.4.0")
+ .serverOnly
.intConf
.checkValue(s => s > 0 && s < 33, "Invalid engine pool threshold, it should be in [1, 32]")
.createWithDefault(9)
@@ -1884,7 +2057,7 @@ object KyuubiConf {
"")
.version("1.7.0")
.stringConf
- .transform(_.toUpperCase(Locale.ROOT))
+ .transformToUpperCase
.checkValues(Set("RANDOM", "POLLING"))
.createWithDefault("RANDOM")
@@ -1908,24 +2081,24 @@ object KyuubiConf {
.toSequence(";")
.createWithDefault(Nil)
- val ENGINE_DEREGISTER_EXCEPTION_CLASSES: ConfigEntry[Seq[String]] =
+ val ENGINE_DEREGISTER_EXCEPTION_CLASSES: ConfigEntry[Set[String]] =
buildConf("kyuubi.engine.deregister.exception.classes")
.doc("A comma-separated list of exception classes. If there is any exception thrown," +
" whose class matches the specified classes, the engine would deregister itself.")
.version("1.2.0")
.stringConf
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
- val ENGINE_DEREGISTER_EXCEPTION_MESSAGES: ConfigEntry[Seq[String]] =
+ val ENGINE_DEREGISTER_EXCEPTION_MESSAGES: ConfigEntry[Set[String]] =
buildConf("kyuubi.engine.deregister.exception.messages")
.doc("A comma-separated list of exception messages. If there is any exception thrown," +
" whose message or stacktrace matches the specified message list, the engine would" +
" deregister itself.")
.version("1.2.0")
.stringConf
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
val ENGINE_DEREGISTER_JOB_MAX_FAILURES: ConfigEntry[Int] =
buildConf("kyuubi.engine.deregister.job.max.failures")
@@ -2007,12 +2180,34 @@ object KyuubiConf {
.stringConf
.createWithDefault("file:///tmp/kyuubi/events")
+ val SERVER_EVENT_KAFKA_TOPIC: OptionalConfigEntry[String] =
+ buildConf("kyuubi.backend.server.event.kafka.topic")
+ .doc("The topic of server events go for the built-in Kafka logger")
+ .version("1.8.0")
+ .serverOnly
+ .stringConf
+ .createOptional
+
+ val SERVER_EVENT_KAFKA_CLOSE_TIMEOUT: ConfigEntry[Long] =
+ buildConf("kyuubi.backend.server.event.kafka.close.timeout")
+ .doc("Period to wait for Kafka producer of server event handlers to close.")
+ .version("1.8.0")
+ .serverOnly
+ .timeConf
+ .createWithDefault(Duration.ofMillis(5000).toMillis)
+
val SERVER_EVENT_LOGGERS: ConfigEntry[Seq[String]] =
buildConf("kyuubi.backend.server.event.loggers")
.doc("A comma-separated list of server history loggers, where session/operation etc" +
" events go.
" +
s"
JSON: the events will be written to the location of" +
s" ${SERVER_EVENT_JSON_LOG_PATH.key}
" +
+ s"
KAFKA: the events will be serialized in JSON format" +
+ s" and sent to topic of `${SERVER_EVENT_KAFKA_TOPIC.key}`." +
+ s" Note: For the configs of Kafka producer," +
+ s" please specify them with the prefix: `kyuubi.backend.server.event.kafka.`." +
+ s" For example, `kyuubi.backend.server.event.kafka.bootstrap.servers=127.0.0.1:9092`" +
+ s"
" +
s"
JDBC: to be done
" +
s"
CUSTOM: User-defined event handlers.
" +
" Note that: Kyuubi supports custom event handlers with the Java SPI." +
@@ -2023,9 +2218,11 @@ object KyuubiConf {
.version("1.4.0")
.serverOnly
.stringConf
- .transform(_.toUpperCase(Locale.ROOT))
+ .transformToUpperCase
.toSequence()
- .checkValue(_.toSet.subsetOf(Set("JSON", "JDBC", "CUSTOM")), "Unsupported event loggers")
+ .checkValue(
+ _.toSet.subsetOf(Set("JSON", "JDBC", "CUSTOM", "KAFKA")),
+ "Unsupported event loggers")
.createWithDefault(Nil)
@deprecated("using kyuubi.engine.spark.event.loggers instead", "1.6.0")
@@ -2045,7 +2242,7 @@ object KyuubiConf {
" which has a zero-arg constructor.")
.version("1.3.0")
.stringConf
- .transform(_.toUpperCase(Locale.ROOT))
+ .transformToUpperCase
.toSequence()
.checkValue(
_.toSet.subsetOf(Set("SPARK", "JSON", "JDBC", "CUSTOM")),
@@ -2171,14 +2368,14 @@ object KyuubiConf {
val OPERATION_PLAN_ONLY_MODE: ConfigEntry[String] =
buildConf("kyuubi.operation.plan.only.mode")
.doc("Configures the statement performed mode, The value can be 'parse', 'analyze', " +
- "'optimize', 'optimize_with_stats', 'physical', 'execution', or 'none', " +
+ "'optimize', 'optimize_with_stats', 'physical', 'execution', 'lineage' or 'none', " +
"when it is 'none', indicate to the statement will be fully executed, otherwise " +
"only way without executing the query. different engines currently support different " +
"modes, the Spark engine supports all modes, and the Flink engine supports 'parse', " +
"'physical', and 'execution', other engines do not support planOnly currently.")
.version("1.4.0")
.stringConf
- .transform(_.toUpperCase(Locale.ROOT))
+ .transformToUpperCase
.checkValue(
mode =>
Set(
@@ -2188,10 +2385,11 @@ object KyuubiConf {
"OPTIMIZE_WITH_STATS",
"PHYSICAL",
"EXECUTION",
+ "LINEAGE",
"NONE").contains(mode),
"Invalid value for 'kyuubi.operation.plan.only.mode'. Valid values are" +
"'parse', 'analyze', 'optimize', 'optimize_with_stats', 'physical', 'execution' and " +
- "'none'.")
+ "'lineage', 'none'.")
.createWithDefault(NoneMode.name)
val OPERATION_PLAN_ONLY_OUT_STYLE: ConfigEntry[String] =
@@ -2201,14 +2399,11 @@ object KyuubiConf {
"of the Spark engine")
.version("1.7.0")
.stringConf
- .transform(_.toUpperCase(Locale.ROOT))
- .checkValue(
- mode => Set("PLAIN", "JSON").contains(mode),
- "Invalid value for 'kyuubi.operation.plan.only.output.style'. Valid values are " +
- "'plain', 'json'.")
+ .transformToUpperCase
+ .checkValues(Set("PLAIN", "JSON"))
.createWithDefault(PlainStyle.name)
- val OPERATION_PLAN_ONLY_EXCLUDES: ConfigEntry[Seq[String]] =
+ val OPERATION_PLAN_ONLY_EXCLUDES: ConfigEntry[Set[String]] =
buildConf("kyuubi.operation.plan.only.excludes")
.doc("Comma-separated list of query plan names, in the form of simple class names, i.e, " +
"for `SET abc=xyz`, the value will be `SetCommand`. For those auxiliary plans, such as " +
@@ -2218,14 +2413,21 @@ object KyuubiConf {
s"See also ${OPERATION_PLAN_ONLY_MODE.key}.")
.version("1.5.0")
.stringConf
- .toSequence()
- .createWithDefault(Seq(
+ .toSet()
+ .createWithDefault(Set(
"ResetCommand",
"SetCommand",
"SetNamespaceCommand",
"UseStatement",
"SetCatalogAndNamespace"))
+ val LINEAGE_PARSER_PLUGIN_PROVIDER: ConfigEntry[String] =
+ buildConf("kyuubi.lineage.parser.plugin.provider")
+ .doc("The provider for the Spark lineage parser plugin.")
+ .version("1.8.0")
+ .stringConf
+ .createWithDefault("org.apache.kyuubi.plugin.lineage.LineageParserProvider")
+
object OperationLanguages extends Enumeration with Logging {
type OperationLanguage = Value
val PYTHON, SQL, SCALA, UNKNOWN = Value
@@ -2252,8 +2454,8 @@ object KyuubiConf {
"")
.version("1.5.0")
.stringConf
- .transform(_.toUpperCase(Locale.ROOT))
- .checkValues(OperationLanguages.values.map(_.toString))
+ .transformToUpperCase
+ .checkValues(OperationLanguages)
.createWithDefault(OperationLanguages.SQL.toString)
val SESSION_CONF_ADVISOR: OptionalConfigEntry[String] =
@@ -2367,14 +2569,14 @@ object KyuubiConf {
val ENGINE_FLINK_MEMORY: ConfigEntry[String] =
buildConf("kyuubi.engine.flink.memory")
- .doc("The heap memory for the Flink SQL engine")
+ .doc("The heap memory for the Flink SQL engine. Only effective in yarn session mode.")
.version("1.6.0")
.stringConf
.createWithDefault("1g")
val ENGINE_FLINK_JAVA_OPTIONS: OptionalConfigEntry[String] =
buildConf("kyuubi.engine.flink.java.options")
- .doc("The extra Java options for the Flink SQL engine")
+ .doc("The extra Java options for the Flink SQL engine. Only effective in yarn session mode.")
.version("1.6.0")
.stringConf
.createOptional
@@ -2382,11 +2584,19 @@ object KyuubiConf {
val ENGINE_FLINK_EXTRA_CLASSPATH: OptionalConfigEntry[String] =
buildConf("kyuubi.engine.flink.extra.classpath")
.doc("The extra classpath for the Flink SQL engine, for configuring the location" +
- " of hadoop client jars, etc")
+ " of hadoop client jars, etc. Only effective in yarn session mode.")
.version("1.6.0")
.stringConf
.createOptional
+ val ENGINE_FLINK_APPLICATION_JARS: OptionalConfigEntry[String] =
+ buildConf("kyuubi.engine.flink.application.jars")
+ .doc("A comma-separated list of the local jars to be shipped with the job to the cluster. " +
+ "For example, SQL UDF jars. Only effective in yarn application mode.")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
val SERVER_LIMIT_CONNECTIONS_PER_USER: OptionalConfigEntry[Int] =
buildConf("kyuubi.server.limit.connections.per.user")
.doc("Maximum kyuubi server connections per user." +
@@ -2414,14 +2624,25 @@ object KyuubiConf {
.intConf
.createOptional
- val SERVER_LIMIT_CONNECTIONS_USER_UNLIMITED_LIST: ConfigEntry[Seq[String]] =
+ val SERVER_LIMIT_CONNECTIONS_USER_UNLIMITED_LIST: ConfigEntry[Set[String]] =
buildConf("kyuubi.server.limit.connections.user.unlimited.list")
.doc("The maximum connections of the user in the white list will not be limited.")
.version("1.7.0")
.serverOnly
.stringConf
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
+
+ val SERVER_LIMIT_CONNECTIONS_USER_DENY_LIST: ConfigEntry[Set[String]] =
+ buildConf("kyuubi.server.limit.connections.user.deny.list")
+ .doc("The user in the deny list will be denied to connect to kyuubi server, " +
+ "if the user has configured both user.unlimited.list and user.deny.list, " +
+ "the priority of the latter is higher.")
+ .version("1.8.0")
+ .serverOnly
+ .stringConf
+ .toSet()
+ .createWithDefault(Set.empty)
val SERVER_LIMIT_BATCH_CONNECTIONS_PER_USER: OptionalConfigEntry[Int] =
buildConf("kyuubi.server.limit.batch.connections.per.user")
@@ -2483,15 +2704,15 @@ object KyuubiConf {
.timeConf
.createWithDefaultString("PT30M")
- val SERVER_ADMINISTRATORS: ConfigEntry[Seq[String]] =
+ val SERVER_ADMINISTRATORS: ConfigEntry[Set[String]] =
buildConf("kyuubi.server.administrators")
.doc("Comma-separated list of Kyuubi service administrators. " +
"We use this config to grant admin permission to any service accounts.")
.version("1.8.0")
.serverOnly
.stringConf
- .toSequence()
- .createWithDefault(Nil)
+ .toSet()
+ .createWithDefault(Set.empty)
val OPERATION_SPARK_LISTENER_ENABLED: ConfigEntry[Boolean] =
buildConf("kyuubi.operation.spark.listener.enabled")
@@ -2515,6 +2736,13 @@ object KyuubiConf {
.stringConf
.createOptional
+ val ENGINE_JDBC_CONNECTION_PROPAGATECREDENTIAL: ConfigEntry[Boolean] =
+ buildConf("kyuubi.engine.jdbc.connection.propagateCredential")
+ .doc("Whether to use the session's user and password to connect to database")
+ .version("1.8.0")
+ .booleanConf
+ .createWithDefault(false)
+
val ENGINE_JDBC_CONNECTION_USER: OptionalConfigEntry[String] =
buildConf("kyuubi.engine.jdbc.connection.user")
.doc("The user is used for connecting to server")
@@ -2551,6 +2779,24 @@ object KyuubiConf {
.stringConf
.createOptional
+ val ENGINE_JDBC_INITIALIZE_SQL: ConfigEntry[Seq[String]] =
+ buildConf("kyuubi.engine.jdbc.initialize.sql")
+ .doc("SemiColon-separated list of SQL statements to be initialized in the newly created " +
+ "engine before queries. i.e. use `SELECT 1` to eagerly active JDBCClient.")
+ .version("1.8.0")
+ .stringConf
+ .toSequence(";")
+ .createWithDefaultString("SELECT 1")
+
+ val ENGINE_JDBC_SESSION_INITIALIZE_SQL: ConfigEntry[Seq[String]] =
+ buildConf("kyuubi.engine.jdbc.session.initialize.sql")
+ .doc("SemiColon-separated list of SQL statements to be initialized in the newly created " +
+ "engine session before queries.")
+ .version("1.8.0")
+ .stringConf
+ .toSequence(";")
+ .createWithDefault(Nil)
+
val ENGINE_OPERATION_CONVERT_CATALOG_DATABASE_ENABLED: ConfigEntry[Boolean] =
buildConf("kyuubi.engine.operation.convert.catalog.database.enabled")
.doc("When set to true, The engine converts the JDBC methods of set/get Catalog " +
@@ -2568,6 +2814,44 @@ object KyuubiConf {
.timeConf
.createWithDefaultString("PT30S")
+ val ENGINE_KUBERNETES_SUBMIT_TIMEOUT: ConfigEntry[Long] =
+ buildConf("kyuubi.engine.kubernetes.submit.timeout")
+ .doc("The engine submit timeout for Kubernetes application.")
+ .version("1.7.2")
+ .fallbackConf(ENGINE_SUBMIT_TIMEOUT)
+
+ val ENGINE_YARN_SUBMIT_TIMEOUT: ConfigEntry[Long] =
+ buildConf("kyuubi.engine.yarn.submit.timeout")
+ .doc("The engine submit timeout for YARN application.")
+ .version("1.7.2")
+ .fallbackConf(ENGINE_SUBMIT_TIMEOUT)
+
+ object YarnUserStrategy extends Enumeration {
+ type YarnUserStrategy = Value
+ val NONE, ADMIN, OWNER = Value
+ }
+
+ val YARN_USER_STRATEGY: ConfigEntry[String] =
+ buildConf("kyuubi.yarn.user.strategy")
+ .doc("Determine which user to use to construct YARN client for application management, " +
+ "e.g. kill application. Options:
" +
+ "
NONE: use Kyuubi server user.
" +
+ "
ADMIN: use admin user configured in `kyuubi.yarn.user.admin`.
" +
+ "
OWNER: use session user, typically is application owner.
" +
+ "
")
+ .version("1.8.0")
+ .stringConf
+ .checkValues(YarnUserStrategy)
+ .createWithDefault("NONE")
+
+ val YARN_USER_ADMIN: ConfigEntry[String] =
+ buildConf("kyuubi.yarn.user.admin")
+ .doc(s"When ${YARN_USER_STRATEGY.key} is set to ADMIN, use this admin user to " +
+ "construct YARN client for application management, e.g. kill application.")
+ .version("1.8.0")
+ .stringConf
+ .createWithDefault("yarn")
+
/**
* Holds information about keys that have been deprecated.
*
@@ -2792,7 +3076,7 @@ object KyuubiConf {
"
")}")
.version("1.3.2")
.stringConf
- .checkValues(AuthTypes.values.map(_.toString))
+ .checkValues(AuthTypes)
.createWithDefault(AuthTypes.NONE.toString)
+ val HA_ZK_AUTH_SERVER_PRINCIPAL: OptionalConfigEntry[String] =
+ buildConf("kyuubi.ha.zookeeper.auth.serverPrincipal")
+ .doc("Kerberos principal name of ZooKeeper Server. It only takes effect when " +
+ "Zookeeper client's version at least 3.5.7 or 3.6.0 or applies ZOOKEEPER-1467. " +
+ "To use Zookeeper 3.6 client, compile Kyuubi with `-Pzookeeper-3.6`.")
+ .version("1.8.0")
+ .stringConf
+ .createOptional
+
val HA_ZK_AUTH_PRINCIPAL: ConfigEntry[Option[String]] =
buildConf("kyuubi.ha.zookeeper.auth.principal")
- .doc("Name of the Kerberos principal is used for ZooKeeper authentication.")
+ .doc("Kerberos principal name that is used for ZooKeeper authentication.")
.version("1.3.2")
.fallbackConf(KyuubiConf.SERVER_PRINCIPAL)
- val HA_ZK_AUTH_KEYTAB: ConfigEntry[Option[String]] = buildConf("kyuubi.ha.zookeeper.auth.keytab")
- .doc("Location of the Kyuubi server's keytab is used for ZooKeeper authentication.")
- .version("1.3.2")
- .fallbackConf(KyuubiConf.SERVER_KEYTAB)
+ val HA_ZK_AUTH_KEYTAB: ConfigEntry[Option[String]] =
+ buildConf("kyuubi.ha.zookeeper.auth.keytab")
+ .doc("Location of the Kyuubi server's keytab that is used for ZooKeeper authentication.")
+ .version("1.3.2")
+ .fallbackConf(KyuubiConf.SERVER_KEYTAB)
- val HA_ZK_AUTH_DIGEST: OptionalConfigEntry[String] = buildConf("kyuubi.ha.zookeeper.auth.digest")
- .doc("The digest auth string is used for ZooKeeper authentication, like: username:password.")
- .version("1.3.2")
- .stringConf
- .createOptional
+ val HA_ZK_AUTH_DIGEST: OptionalConfigEntry[String] =
+ buildConf("kyuubi.ha.zookeeper.auth.digest")
+ .doc("The digest auth string is used for ZooKeeper authentication, like: username:password.")
+ .version("1.3.2")
+ .stringConf
+ .createOptional
val HA_ZK_CONN_MAX_RETRIES: ConfigEntry[Int] =
buildConf("kyuubi.ha.zookeeper.connection.max.retries")
@@ -149,7 +160,7 @@ object HighAvailabilityConf {
s" ${RetryPolicies.values.mkString("
", "
", "
")}")
.version("1.0.0")
.stringConf
- .checkValues(RetryPolicies.values.map(_.toString))
+ .checkValues(RetryPolicies)
.createWithDefault(RetryPolicies.EXPONENTIAL_BACKOFF.toString)
val HA_ZK_NODE_TIMEOUT: ConfigEntry[Long] =
@@ -209,14 +220,14 @@ object HighAvailabilityConf {
.stringConf
.createOptional
- val HA_ETCD_SSL_CLINET_CRT_PATH: OptionalConfigEntry[String] =
+ val HA_ETCD_SSL_CLIENT_CRT_PATH: OptionalConfigEntry[String] =
buildConf("kyuubi.ha.etcd.ssl.client.certificate.path")
.doc("Where the etcd SSL certificate file is stored.")
.version("1.6.0")
.stringConf
.createOptional
- val HA_ETCD_SSL_CLINET_KEY_PATH: OptionalConfigEntry[String] =
+ val HA_ETCD_SSL_CLIENT_KEY_PATH: OptionalConfigEntry[String] =
buildConf("kyuubi.ha.etcd.ssl.client.key.path")
.doc("Where the etcd SSL key file is stored.")
.version("1.6.0")
diff --git a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/DiscoveryPaths.scala b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/DiscoveryPaths.scala
index 987a88ddafd..fe7ebe2ab86 100644
--- a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/DiscoveryPaths.scala
+++ b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/DiscoveryPaths.scala
@@ -17,7 +17,7 @@
package org.apache.kyuubi.ha.client
-import org.apache.curator.utils.ZKPaths
+import org.apache.kyuubi.shaded.curator.utils.ZKPaths
object DiscoveryPaths {
def makePath(parent: String, firstChild: String, restChildren: String*): String = {
diff --git a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/ServiceDiscovery.scala b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/ServiceDiscovery.scala
index bdb9b12fe82..a1b1466d122 100644
--- a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/ServiceDiscovery.scala
+++ b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/ServiceDiscovery.scala
@@ -60,6 +60,7 @@ abstract class ServiceDiscovery(
override def start(): Unit = {
discoveryClient.registerService(conf, namespace, this)
+ info(s"Registered $name in namespace ${_namespace}.")
super.start()
}
diff --git a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/etcd/EtcdDiscoveryClient.scala b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/etcd/EtcdDiscoveryClient.scala
index 80a70f2f218..d979804f417 100644
--- a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/etcd/EtcdDiscoveryClient.scala
+++ b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/etcd/EtcdDiscoveryClient.scala
@@ -74,10 +74,10 @@ class EtcdDiscoveryClient(conf: KyuubiConf) extends DiscoveryClient {
} else {
val caPath = conf.getOption(HA_ETCD_SSL_CA_PATH.key).getOrElse(
throw new IllegalArgumentException(s"${HA_ETCD_SSL_CA_PATH.key} is not defined"))
- val crtPath = conf.getOption(HA_ETCD_SSL_CLINET_CRT_PATH.key).getOrElse(
- throw new IllegalArgumentException(s"${HA_ETCD_SSL_CLINET_CRT_PATH.key} is not defined"))
- val keyPath = conf.getOption(HA_ETCD_SSL_CLINET_KEY_PATH.key).getOrElse(
- throw new IllegalArgumentException(s"${HA_ETCD_SSL_CLINET_KEY_PATH.key} is not defined"))
+ val crtPath = conf.getOption(HA_ETCD_SSL_CLIENT_CRT_PATH.key).getOrElse(
+ throw new IllegalArgumentException(s"${HA_ETCD_SSL_CLIENT_CRT_PATH.key} is not defined"))
+ val keyPath = conf.getOption(HA_ETCD_SSL_CLIENT_KEY_PATH.key).getOrElse(
+ throw new IllegalArgumentException(s"${HA_ETCD_SSL_CLIENT_KEY_PATH.key} is not defined"))
val context = GrpcSslContexts.forClient()
.trustManager(new File(caPath))
@@ -358,11 +358,11 @@ class EtcdDiscoveryClient(conf: KyuubiConf) extends DiscoveryClient {
client.getLeaseClient.keepAlive(
leaseId,
new StreamObserver[LeaseKeepAliveResponse] {
- override def onNext(v: LeaseKeepAliveResponse): Unit = Unit // do nothing
+ override def onNext(v: LeaseKeepAliveResponse): Unit = () // do nothing
- override def onError(throwable: Throwable): Unit = Unit // do nothing
+ override def onError(throwable: Throwable): Unit = () // do nothing
- override def onCompleted(): Unit = Unit // do nothing
+ override def onCompleted(): Unit = () // do nothing
})
client.getKVClient.put(
ByteSequence.from(realPath.getBytes()),
@@ -388,7 +388,7 @@ class EtcdDiscoveryClient(conf: KyuubiConf) extends DiscoveryClient {
override def onError(throwable: Throwable): Unit =
throw new KyuubiException(throwable.getMessage, throwable.getCause)
- override def onCompleted(): Unit = Unit
+ override def onCompleted(): Unit = ()
}
}
diff --git a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperACLProvider.scala b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperACLProvider.scala
index 467c323b77e..87ea65c17a2 100644
--- a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperACLProvider.scala
+++ b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperACLProvider.scala
@@ -17,13 +17,12 @@
package org.apache.kyuubi.ha.client.zookeeper
-import org.apache.curator.framework.api.ACLProvider
-import org.apache.zookeeper.ZooDefs
-import org.apache.zookeeper.data.ACL
-
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.ha.HighAvailabilityConf
import org.apache.kyuubi.ha.client.AuthTypes
+import org.apache.kyuubi.shaded.curator.framework.api.ACLProvider
+import org.apache.kyuubi.shaded.zookeeper.ZooDefs
+import org.apache.kyuubi.shaded.zookeeper.data.ACL
class ZookeeperACLProvider(conf: KyuubiConf) extends ACLProvider {
diff --git a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperClientProvider.scala b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperClientProvider.scala
index 8dd32d6b62b..d0749c8d923 100644
--- a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperClientProvider.scala
+++ b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperClientProvider.scala
@@ -18,22 +18,23 @@
package org.apache.kyuubi.ha.client.zookeeper
import java.io.{File, IOException}
+import java.nio.charset.StandardCharsets
import javax.security.auth.login.Configuration
import scala.util.Random
import com.google.common.annotations.VisibleForTesting
-import org.apache.curator.framework.{CuratorFramework, CuratorFrameworkFactory}
-import org.apache.curator.retry._
import org.apache.hadoop.security.UserGroupInformation
-import org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.JaasConfiguration
import org.apache.kyuubi.Logging
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.ha.HighAvailabilityConf._
import org.apache.kyuubi.ha.client.{AuthTypes, RetryPolicies}
import org.apache.kyuubi.ha.client.RetryPolicies._
+import org.apache.kyuubi.shaded.curator.framework.{CuratorFramework, CuratorFrameworkFactory}
+import org.apache.kyuubi.shaded.curator.retry._
import org.apache.kyuubi.util.KyuubiHadoopUtils
+import org.apache.kyuubi.util.reflect.DynConstructors
object ZookeeperClientProvider extends Logging {
@@ -65,10 +66,8 @@ object ZookeeperClientProvider extends Logging {
.aclProvider(new ZookeeperACLProvider(conf))
.retryPolicy(retryPolicy)
- conf.get(HA_ZK_AUTH_DIGEST) match {
- case Some(anthString) =>
- builder.authorization("digest", anthString.getBytes("UTF-8"))
- case _ =>
+ conf.get(HA_ZK_AUTH_DIGEST).foreach { authString =>
+ builder.authorization("digest", authString.getBytes(StandardCharsets.UTF_8))
}
builder.build()
@@ -103,46 +102,51 @@ object ZookeeperClientProvider extends Logging {
*/
@throws[Exception]
def setUpZooKeeperAuth(conf: KyuubiConf): Unit = {
- def setupZkAuth(): Unit = {
- val keyTabFile = getKeyTabFile(conf)
- val maybePrincipal = conf.get(HA_ZK_AUTH_PRINCIPAL)
- val kerberized = maybePrincipal.isDefined && keyTabFile.isDefined
- if (UserGroupInformation.isSecurityEnabled && kerberized) {
- if (!new File(keyTabFile.get).exists()) {
- throw new IOException(s"${HA_ZK_AUTH_KEYTAB.key}: $keyTabFile does not exists")
+ def setupZkAuth(): Unit = (conf.get(HA_ZK_AUTH_PRINCIPAL), getKeyTabFile(conf)) match {
+ case (Some(principal), Some(keytab)) if UserGroupInformation.isSecurityEnabled =>
+ if (!new File(keytab).exists()) {
+ throw new IOException(s"${HA_ZK_AUTH_KEYTAB.key}: $keytab does not exists")
}
System.setProperty("zookeeper.sasl.clientconfig", "KyuubiZooKeeperClient")
- var principal = maybePrincipal.get
- principal = KyuubiHadoopUtils.getServerPrincipal(principal)
- val jaasConf = new JaasConfiguration("KyuubiZooKeeperClient", principal, keyTabFile.get)
+ conf.get(HA_ZK_AUTH_SERVER_PRINCIPAL).foreach { zkServerPrincipal =>
+ // ZOOKEEPER-1467 allows configuring SPN in client
+ System.setProperty("zookeeper.server.principal", zkServerPrincipal)
+ }
+ val zkClientPrincipal = KyuubiHadoopUtils.getServerPrincipal(principal)
+ // HDFS-16591 makes breaking change on JaasConfiguration
+ val jaasConf = DynConstructors.builder()
+ .impl( // Hadoop 3.3.5 and above
+ "org.apache.hadoop.security.authentication.util.JaasConfiguration",
+ classOf[String],
+ classOf[String],
+ classOf[String])
+ .impl( // Hadoop 3.3.4 and previous
+ // scalastyle:off
+ "org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager$JaasConfiguration",
+ // scalastyle:on
+ classOf[String],
+ classOf[String],
+ classOf[String])
+ .build[Configuration]()
+ .newInstance("KyuubiZooKeeperClient", zkClientPrincipal, keytab)
Configuration.setConfiguration(jaasConf)
- }
+ case _ =>
}
- if (conf.get(HA_ENGINE_REF_ID).isEmpty
- && AuthTypes.withName(conf.get(HA_ZK_AUTH_TYPE)) == AuthTypes.KERBEROS) {
+ if (conf.get(HA_ENGINE_REF_ID).isEmpty &&
+ AuthTypes.withName(conf.get(HA_ZK_AUTH_TYPE)) == AuthTypes.KERBEROS) {
setupZkAuth()
- } else if (conf.get(HA_ENGINE_REF_ID).nonEmpty && AuthTypes
- .withName(conf.get(HA_ZK_ENGINE_AUTH_TYPE)) == AuthTypes.KERBEROS) {
+ } else if (conf.get(HA_ENGINE_REF_ID).nonEmpty &&
+ AuthTypes.withName(conf.get(HA_ZK_ENGINE_AUTH_TYPE)) == AuthTypes.KERBEROS) {
setupZkAuth()
}
-
}
@VisibleForTesting
def getKeyTabFile(conf: KyuubiConf): Option[String] = {
- val zkAuthKeytab = conf.get(HA_ZK_AUTH_KEYTAB)
- if (zkAuthKeytab.isDefined) {
- val zkAuthKeytabPath = zkAuthKeytab.get
- val relativeFileName = new File(zkAuthKeytabPath).getName
- if (new File(relativeFileName).exists()) {
- Some(relativeFileName)
- } else {
- Some(zkAuthKeytabPath)
- }
- } else {
- None
+ conf.get(HA_ZK_AUTH_KEYTAB).map { fullPath =>
+ val filename = new File(fullPath).getName
+ if (new File(filename).exists()) filename else fullPath
}
}
-
}
diff --git a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperDiscoveryClient.scala b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperDiscoveryClient.scala
index daa27047eb9..2db7d89d649 100644
--- a/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperDiscoveryClient.scala
+++ b/kyuubi-ha/src/main/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperDiscoveryClient.scala
@@ -25,39 +25,25 @@ import java.util.concurrent.atomic.AtomicBoolean
import scala.collection.JavaConverters._
import com.google.common.annotations.VisibleForTesting
-import org.apache.curator.framework.CuratorFramework
-import org.apache.curator.framework.recipes.atomic.{AtomicValue, DistributedAtomicInteger}
-import org.apache.curator.framework.recipes.locks.InterProcessSemaphoreMutex
-import org.apache.curator.framework.recipes.nodes.PersistentNode
-import org.apache.curator.framework.state.ConnectionState
-import org.apache.curator.framework.state.ConnectionState.CONNECTED
-import org.apache.curator.framework.state.ConnectionState.LOST
-import org.apache.curator.framework.state.ConnectionState.RECONNECTED
-import org.apache.curator.framework.state.ConnectionStateListener
-import org.apache.curator.retry.RetryForever
-import org.apache.curator.utils.ZKPaths
-import org.apache.zookeeper.CreateMode
-import org.apache.zookeeper.CreateMode.PERSISTENT
-import org.apache.zookeeper.KeeperException
-import org.apache.zookeeper.KeeperException.NodeExistsException
-import org.apache.zookeeper.WatchedEvent
-import org.apache.zookeeper.Watcher
-
-import org.apache.kyuubi.KYUUBI_VERSION
-import org.apache.kyuubi.KyuubiException
-import org.apache.kyuubi.KyuubiSQLException
-import org.apache.kyuubi.Logging
+
+import org.apache.kyuubi.{KYUUBI_VERSION, KyuubiException, KyuubiSQLException, Logging}
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_ENGINE_ID
-import org.apache.kyuubi.ha.HighAvailabilityConf.HA_ENGINE_REF_ID
-import org.apache.kyuubi.ha.HighAvailabilityConf.HA_ZK_NODE_TIMEOUT
-import org.apache.kyuubi.ha.HighAvailabilityConf.HA_ZK_PUBLISH_CONFIGS
-import org.apache.kyuubi.ha.client.DiscoveryClient
-import org.apache.kyuubi.ha.client.ServiceDiscovery
-import org.apache.kyuubi.ha.client.ServiceNodeInfo
-import org.apache.kyuubi.ha.client.zookeeper.ZookeeperClientProvider.buildZookeeperClient
-import org.apache.kyuubi.ha.client.zookeeper.ZookeeperClientProvider.getGracefulStopThreadDelay
+import org.apache.kyuubi.ha.HighAvailabilityConf.{HA_ENGINE_REF_ID, HA_ZK_NODE_TIMEOUT, HA_ZK_PUBLISH_CONFIGS}
+import org.apache.kyuubi.ha.client.{DiscoveryClient, ServiceDiscovery, ServiceNodeInfo}
+import org.apache.kyuubi.ha.client.zookeeper.ZookeeperClientProvider.{buildZookeeperClient, getGracefulStopThreadDelay}
import org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient.connectionChecker
+import org.apache.kyuubi.shaded.curator.framework.CuratorFramework
+import org.apache.kyuubi.shaded.curator.framework.recipes.atomic.{AtomicValue, DistributedAtomicInteger}
+import org.apache.kyuubi.shaded.curator.framework.recipes.locks.InterProcessSemaphoreMutex
+import org.apache.kyuubi.shaded.curator.framework.recipes.nodes.PersistentNode
+import org.apache.kyuubi.shaded.curator.framework.state.{ConnectionState, ConnectionStateListener}
+import org.apache.kyuubi.shaded.curator.framework.state.ConnectionState.{CONNECTED, LOST, RECONNECTED}
+import org.apache.kyuubi.shaded.curator.retry.RetryForever
+import org.apache.kyuubi.shaded.curator.utils.ZKPaths
+import org.apache.kyuubi.shaded.zookeeper.{CreateMode, KeeperException, WatchedEvent, Watcher}
+import org.apache.kyuubi.shaded.zookeeper.CreateMode.PERSISTENT
+import org.apache.kyuubi.shaded.zookeeper.KeeperException.NodeExistsException
import org.apache.kyuubi.util.ThreadUtils
class ZookeeperDiscoveryClient(conf: KyuubiConf) extends DiscoveryClient {
@@ -226,7 +212,7 @@ class ZookeeperDiscoveryClient(conf: KyuubiConf) extends DiscoveryClient {
info(s"Get service instance:$instance$engineIdStr and version:${version.getOrElse("")} " +
s"under $namespace")
ServiceNodeInfo(namespace, p, host, port, version, engineRefId, attributes)
- }
+ }.toSeq
} catch {
case _: Exception if silent => Nil
case e: Exception =>
@@ -305,6 +291,10 @@ class ZookeeperDiscoveryClient(conf: KyuubiConf) extends DiscoveryClient {
basePath,
initData.getBytes(StandardCharsets.UTF_8))
secretNode.start()
+ val znodeTimeout = conf.get(HA_ZK_NODE_TIMEOUT)
+ if (!secretNode.waitForInitialCreate(znodeTimeout, TimeUnit.MILLISECONDS)) {
+ throw new KyuubiException(s"Max znode creation wait time $znodeTimeout s exhausted")
+ }
}
override def getAndIncrement(path: String, delta: Int = 1): Int = {
diff --git a/kyuubi-ha/src/test/scala/org/apache/kyuubi/ha/client/DiscoveryClientTests.scala b/kyuubi-ha/src/test/scala/org/apache/kyuubi/ha/client/DiscoveryClientTests.scala
index 87db340b5fe..9caf3864640 100644
--- a/kyuubi-ha/src/test/scala/org/apache/kyuubi/ha/client/DiscoveryClientTests.scala
+++ b/kyuubi-ha/src/test/scala/org/apache/kyuubi/ha/client/DiscoveryClientTests.scala
@@ -135,17 +135,17 @@ trait DiscoveryClientTests extends KyuubiFunSuite {
new Thread(() => {
withDiscoveryClient(conf) { discoveryClient =>
- discoveryClient.tryWithLock(lockPath, 3000) {
+ discoveryClient.tryWithLock(lockPath, 10000) {
lockLatch.countDown()
- Thread.sleep(5000)
+ Thread.sleep(15000)
}
}
}).start()
withDiscoveryClient(conf) { discoveryClient =>
- assert(lockLatch.await(5000, TimeUnit.MILLISECONDS))
+ assert(lockLatch.await(20000, TimeUnit.MILLISECONDS))
val e = intercept[KyuubiSQLException] {
- discoveryClient.tryWithLock(lockPath, 2000) {}
+ discoveryClient.tryWithLock(lockPath, 5000) {}
}
assert(e.getMessage contains s"Timeout to lock on path [$lockPath]")
}
@@ -162,7 +162,7 @@ trait DiscoveryClientTests extends KyuubiFunSuite {
test("setData method test") {
withDiscoveryClient(conf) { discoveryClient =>
- val data = "abc";
+ val data = "abc"
val path = "/setData_test"
discoveryClient.create(path, "PERSISTENT")
discoveryClient.setData(path, data.getBytes)
diff --git a/kyuubi-ha/src/test/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperDiscoveryClientSuite.scala b/kyuubi-ha/src/test/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperDiscoveryClientSuite.scala
index bbd8b94ac7c..dd78e1fb8a0 100644
--- a/kyuubi-ha/src/test/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperDiscoveryClientSuite.scala
+++ b/kyuubi-ha/src/test/scala/org/apache/kyuubi/ha/client/zookeeper/ZookeeperDiscoveryClientSuite.scala
@@ -25,11 +25,7 @@ import javax.security.auth.login.Configuration
import scala.collection.JavaConverters._
-import org.apache.curator.framework.CuratorFrameworkFactory
-import org.apache.curator.retry.ExponentialBackoffRetry
import org.apache.hadoop.util.StringUtils
-import org.apache.zookeeper.ZooDefs
-import org.apache.zookeeper.data.ACL
import org.scalatest.time.SpanSugar._
import org.apache.kyuubi.{KerberizedTestHelper, KYUUBI_VERSION}
@@ -37,7 +33,13 @@ import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.ha.HighAvailabilityConf._
import org.apache.kyuubi.ha.client._
import org.apache.kyuubi.ha.client.DiscoveryClientProvider.withDiscoveryClient
+import org.apache.kyuubi.ha.client.zookeeper.ZookeeperClientProvider._
import org.apache.kyuubi.service._
+import org.apache.kyuubi.shaded.curator.framework.CuratorFrameworkFactory
+import org.apache.kyuubi.shaded.curator.retry.ExponentialBackoffRetry
+import org.apache.kyuubi.shaded.zookeeper.ZooDefs
+import org.apache.kyuubi.shaded.zookeeper.data.ACL
+import org.apache.kyuubi.util.reflect.ReflectUtils._
import org.apache.kyuubi.zookeeper.EmbeddedZookeeper
import org.apache.kyuubi.zookeeper.ZookeeperConf.ZK_CLIENT_PORT
@@ -117,7 +119,7 @@ abstract class ZookeeperDiscoveryClientSuite extends DiscoveryClientTests
conf.set(HA_ZK_AUTH_PRINCIPAL.key, principal)
conf.set(HA_ZK_AUTH_TYPE.key, AuthTypes.KERBEROS.toString)
- ZookeeperClientProvider.setUpZooKeeperAuth(conf)
+ setUpZooKeeperAuth(conf)
val configuration = Configuration.getConfiguration
val entries = configuration.getAppConfigurationEntry("KyuubiZooKeeperClient")
@@ -129,9 +131,9 @@ abstract class ZookeeperDiscoveryClientSuite extends DiscoveryClientTests
assert(options("useKeyTab").toString.toBoolean)
conf.set(HA_ZK_AUTH_KEYTAB.key, s"${keytab.getName}")
- val e = intercept[IOException](ZookeeperClientProvider.setUpZooKeeperAuth(conf))
- assert(e.getMessage ===
- s"${HA_ZK_AUTH_KEYTAB.key}: ${ZookeeperClientProvider.getKeyTabFile(conf)} does not exists")
+ val e = intercept[IOException](setUpZooKeeperAuth(conf))
+ assert(
+ e.getMessage === s"${HA_ZK_AUTH_KEYTAB.key}: ${getKeyTabFile(conf).get} does not exists")
}
}
@@ -155,12 +157,11 @@ abstract class ZookeeperDiscoveryClientSuite extends DiscoveryClientTests
assert(service.getServiceState === ServiceState.STARTED)
stopZk()
- val isServerLostM = discovery.getClass.getSuperclass.getDeclaredField("isServerLost")
- isServerLostM.setAccessible(true)
- val isServerLost = isServerLostM.get(discovery)
+ val isServerLost =
+ getField[AtomicBoolean]((discovery.getClass.getSuperclass, discovery), "isServerLost")
eventually(timeout(10.seconds), interval(100.millis)) {
- assert(isServerLost.asInstanceOf[AtomicBoolean].get())
+ assert(isServerLost.get())
assert(discovery.getServiceState === ServiceState.STOPPED)
assert(service.getServiceState === ServiceState.STOPPED)
}
diff --git a/kyuubi-hive-beeline/pom.xml b/kyuubi-hive-beeline/pom.xml
index beacba438c2..1068a81ce18 100644
--- a/kyuubi-hive-beeline/pom.xml
+++ b/kyuubi-hive-beeline/pom.xml
@@ -21,7 +21,7 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOTkyuubi-hive-beeline
@@ -40,6 +40,12 @@
${project.version}
+
+ org.apache.kyuubi
+ kyuubi-util
+ ${project.version}
+
+
org.apache.hivehive-beeline
@@ -155,6 +161,11 @@
log4j-slf4j-impl
+
+ org.slf4j
+ jul-to-slf4j
+
+
org.apache.logging.log4jlog4j-api
@@ -217,6 +228,14 @@
true
+
+
+ org.apache.maven.plugins
+ maven-surefire-plugin
+
+ ${skipTests}
+
+ target/classestarget/test-classes
diff --git a/kyuubi-hive-beeline/src/main/java/org/apache/hive/beeline/KyuubiBeeLine.java b/kyuubi-hive-beeline/src/main/java/org/apache/hive/beeline/KyuubiBeeLine.java
index 7ca7671486b..224cbb3ce11 100644
--- a/kyuubi-hive-beeline/src/main/java/org/apache/hive/beeline/KyuubiBeeLine.java
+++ b/kyuubi-hive-beeline/src/main/java/org/apache/hive/beeline/KyuubiBeeLine.java
@@ -19,22 +19,45 @@
import java.io.IOException;
import java.io.InputStream;
-import java.lang.reflect.Field;
-import java.lang.reflect.Method;
import java.sql.Driver;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.Iterator;
-import java.util.List;
+import java.util.*;
import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.Options;
import org.apache.commons.cli.ParseException;
+import org.apache.hive.common.util.HiveStringUtils;
+import org.apache.kyuubi.util.reflect.DynConstructors;
+import org.apache.kyuubi.util.reflect.DynFields;
+import org.apache.kyuubi.util.reflect.DynMethods;
public class KyuubiBeeLine extends BeeLine {
+
+ static {
+ try {
+ // We use reflection here to handle the case where users remove the
+ // slf4j-to-jul bridge order to route their logs to JUL.
+ Class> bridgeClass = Class.forName("org.slf4j.bridge.SLF4JBridgeHandler");
+ bridgeClass.getMethod("removeHandlersForRootLogger").invoke(null);
+ boolean installed = (boolean) bridgeClass.getMethod("isInstalled").invoke(null);
+ if (!installed) {
+ bridgeClass.getMethod("install").invoke(null);
+ }
+ } catch (ReflectiveOperationException cnf) {
+ // can't log anything yet so just fail silently
+ }
+ }
+
public static final String KYUUBI_BEELINE_DEFAULT_JDBC_DRIVER =
"org.apache.kyuubi.jdbc.KyuubiHiveDriver";
protected KyuubiCommands commands = new KyuubiCommands(this);
- private Driver defaultDriver = null;
+ private Driver defaultDriver;
+
+ // copied from org.apache.hive.beeline.BeeLine
+ private static final int ERRNO_OK = 0;
+ private static final int ERRNO_ARGS = 1;
+ private static final int ERRNO_OTHER = 2;
+
+ private static final String PYTHON_MODE_PREFIX = "--python-mode";
+ private boolean pythonMode = false;
public KyuubiBeeLine() {
this(true);
@@ -44,25 +67,37 @@ public KyuubiBeeLine() {
public KyuubiBeeLine(boolean isBeeLine) {
super(isBeeLine);
try {
- Field commandsField = BeeLine.class.getDeclaredField("commands");
- commandsField.setAccessible(true);
- commandsField.set(this, commands);
+ DynFields.builder().hiddenImpl(BeeLine.class, "commands").buildChecked(this).set(commands);
} catch (Throwable t) {
throw new ExceptionInInitializerError("Failed to inject kyuubi commands");
}
try {
defaultDriver =
- (Driver)
- Class.forName(
- KYUUBI_BEELINE_DEFAULT_JDBC_DRIVER,
- true,
- Thread.currentThread().getContextClassLoader())
- .newInstance();
+ DynConstructors.builder()
+ .impl(KYUUBI_BEELINE_DEFAULT_JDBC_DRIVER)
+ .buildChecked()
+ .newInstance();
} catch (Throwable t) {
throw new ExceptionInInitializerError(KYUUBI_BEELINE_DEFAULT_JDBC_DRIVER + "-missing");
}
}
+ @Override
+ void usage() {
+ super.usage();
+ output("Usage: java \" + KyuubiBeeLine.class.getCanonicalName()");
+ output(" --python-mode Execute python code/script.");
+ }
+
+ public boolean isPythonMode() {
+ return pythonMode;
+ }
+
+ // Visible for testing
+ public void setPythonMode(boolean pythonMode) {
+ this.pythonMode = pythonMode;
+ }
+
/** Starts the program. */
public static void main(String[] args) throws IOException {
mainWithInputRedirection(args, null);
@@ -115,25 +150,37 @@ int initArgs(String[] args) {
BeelineParser beelineParser;
boolean connSuccessful;
boolean exit;
- Field exitField;
+ DynFields.BoundField exitField;
try {
- Field optionsField = BeeLine.class.getDeclaredField("options");
- optionsField.setAccessible(true);
- Options options = (Options) optionsField.get(this);
+ Options options =
+ DynFields.builder()
+ .hiddenImpl(BeeLine.class, "options")
+ .buildStaticChecked()
+ .get();
- beelineParser = new BeelineParser();
+ beelineParser =
+ new BeelineParser() {
+ @SuppressWarnings("rawtypes")
+ @Override
+ protected void processOption(String arg, ListIterator iter) throws ParseException {
+ if (PYTHON_MODE_PREFIX.equals(arg)) {
+ pythonMode = true;
+ } else {
+ super.processOption(arg, iter);
+ }
+ }
+ };
cl = beelineParser.parse(options, args);
- Method connectUsingArgsMethod =
- BeeLine.class.getDeclaredMethod(
- "connectUsingArgs", BeelineParser.class, CommandLine.class);
- connectUsingArgsMethod.setAccessible(true);
- connSuccessful = (boolean) connectUsingArgsMethod.invoke(this, beelineParser, cl);
+ connSuccessful =
+ DynMethods.builder("connectUsingArgs")
+ .hiddenImpl(BeeLine.class, BeelineParser.class, CommandLine.class)
+ .buildChecked(this)
+ .invoke(beelineParser, cl);
- exitField = BeeLine.class.getDeclaredField("exit");
- exitField.setAccessible(true);
- exit = (boolean) exitField.get(this);
+ exitField = DynFields.builder().hiddenImpl(BeeLine.class, "exit").buildChecked(this);
+ exit = exitField.get();
} catch (ParseException e1) {
output(e1.getMessage());
@@ -149,10 +196,11 @@ int initArgs(String[] args) {
// no-op if the file is not present
if (!connSuccessful && !exit) {
try {
- Method defaultBeelineConnectMethod =
- BeeLine.class.getDeclaredMethod("defaultBeelineConnect", CommandLine.class);
- defaultBeelineConnectMethod.setAccessible(true);
- connSuccessful = (boolean) defaultBeelineConnectMethod.invoke(this, cl);
+ connSuccessful =
+ DynMethods.builder("defaultBeelineConnect")
+ .hiddenImpl(BeeLine.class, CommandLine.class)
+ .buildChecked(this)
+ .invoke(cl);
} catch (Exception t) {
error(t.getMessage());
@@ -160,6 +208,11 @@ int initArgs(String[] args) {
}
}
+ // see HIVE-19048 : InitScript errors are ignored
+ if (exit) {
+ return 1;
+ }
+
int code = 0;
if (cl.getOptionValues('e') != null) {
commands = Arrays.asList(cl.getOptionValues('e'));
@@ -175,8 +228,7 @@ int initArgs(String[] args) {
return 1;
}
if (!commands.isEmpty()) {
- for (Iterator i = commands.iterator(); i.hasNext(); ) {
- String command = i.next().toString();
+ for (String command : commands) {
debug(loc("executing-command", command));
if (!dispatch(command)) {
code++;
@@ -184,7 +236,7 @@ int initArgs(String[] args) {
}
try {
exit = true;
- exitField.set(this, exit);
+ exitField.set(exit);
} catch (Exception e) {
error(e.getMessage());
return 1;
@@ -192,4 +244,59 @@ int initArgs(String[] args) {
}
return code;
}
+
+ // see HIVE-19048 : Initscript errors are ignored
+ @Override
+ int runInit() {
+ String[] initFiles = getOpts().getInitFiles();
+
+ // executionResult will be ERRNO_OK only if all initFiles execute successfully
+ int executionResult = ERRNO_OK;
+ boolean exitOnError = !getOpts().getForce();
+ DynFields.BoundField exitField = null;
+
+ if (initFiles != null && initFiles.length != 0) {
+ for (String initFile : initFiles) {
+ info("Running init script " + initFile);
+ try {
+ int currentResult;
+ try {
+ currentResult =
+ DynMethods.builder("executeFile")
+ .hiddenImpl(BeeLine.class, String.class)
+ .buildChecked(this)
+ .invoke(initFile);
+ exitField = DynFields.builder().hiddenImpl(BeeLine.class, "exit").buildChecked(this);
+ } catch (Exception t) {
+ error(t.getMessage());
+ currentResult = ERRNO_OTHER;
+ }
+
+ if (currentResult != ERRNO_OK) {
+ executionResult = currentResult;
+
+ if (exitOnError) {
+ return executionResult;
+ }
+ }
+ } finally {
+ // exit beeline if there is initScript failure and --force is not set
+ boolean exit = exitOnError && executionResult != ERRNO_OK;
+ try {
+ exitField.set(exit);
+ } catch (Exception t) {
+ error(t.getMessage());
+ return ERRNO_OTHER;
+ }
+ }
+ }
+ }
+ return executionResult;
+ }
+
+ // see HIVE-15820: comment at the head of beeline -e
+ @Override
+ boolean dispatch(String line) {
+ return super.dispatch(isPythonMode() ? line : HiveStringUtils.removeComments(line));
+ }
}
diff --git a/kyuubi-hive-beeline/src/main/java/org/apache/hive/beeline/KyuubiCommands.java b/kyuubi-hive-beeline/src/main/java/org/apache/hive/beeline/KyuubiCommands.java
index 311cb6a9538..fcfee49edb0 100644
--- a/kyuubi-hive-beeline/src/main/java/org/apache/hive/beeline/KyuubiCommands.java
+++ b/kyuubi-hive-beeline/src/main/java/org/apache/hive/beeline/KyuubiCommands.java
@@ -21,9 +21,11 @@
import com.google.common.annotations.VisibleForTesting;
import java.io.*;
+import java.nio.file.Files;
import java.sql.*;
import java.util.*;
import org.apache.hive.beeline.logs.KyuubiBeelineInPlaceUpdateStream;
+import org.apache.hive.common.util.HiveStringUtils;
import org.apache.kyuubi.jdbc.hive.KyuubiStatement;
import org.apache.kyuubi.jdbc.hive.Utils;
import org.apache.kyuubi.jdbc.hive.logs.InPlaceUpdateStream;
@@ -44,9 +46,14 @@ public boolean sql(String line) {
return execute(line, false, false);
}
+ /** For python mode, keep it as it is. */
+ private String trimForNonPythonMode(String line) {
+ return beeLine.isPythonMode() ? line : line.trim();
+ }
+
/** Extract and clean up the first command in the input. */
private String getFirstCmd(String cmd, int length) {
- return cmd.substring(length).trim();
+ return trimForNonPythonMode(cmd.substring(length));
}
private String[] tokenizeCmd(String cmd) {
@@ -80,10 +87,9 @@ private boolean sourceFile(String cmd) {
}
private boolean sourceFileInternal(File sourceFile) throws IOException {
- BufferedReader reader = null;
- try {
- reader = new BufferedReader(new FileReader(sourceFile));
- String lines = null, extra;
+ try (BufferedReader reader = Files.newBufferedReader(sourceFile.toPath())) {
+ String lines = null;
+ String extra;
while ((extra = reader.readLine()) != null) {
if (beeLine.isComment(extra)) {
continue;
@@ -96,15 +102,11 @@ private boolean sourceFileInternal(File sourceFile) throws IOException {
}
String[] cmds = lines.split(beeLine.getOpts().getDelimiter());
for (String c : cmds) {
- c = c.trim();
+ c = trimForNonPythonMode(c);
if (!executeInternal(c, false)) {
return false;
}
}
- } finally {
- if (reader != null) {
- reader.close();
- }
}
return true;
}
@@ -260,9 +262,10 @@ private boolean execute(String line, boolean call, boolean entireLineAsCommand)
beeLine.handleException(e);
}
+ line = trimForNonPythonMode(line);
List cmdList = getCmdList(line, entireLineAsCommand);
for (int i = 0; i < cmdList.size(); i++) {
- String sql = cmdList.get(i);
+ String sql = trimForNonPythonMode(cmdList.get(i));
if (sql.length() != 0) {
if (!executeInternal(sql, call)) {
return false;
@@ -355,7 +358,7 @@ public List getCmdList(String line, boolean entireLineAsCommand) {
*/
private void addCmdPart(List cmdList, StringBuilder command, String cmdpart) {
if (cmdpart.endsWith("\\")) {
- command.append(cmdpart.substring(0, cmdpart.length() - 1)).append(";");
+ command.append(cmdpart, 0, cmdpart.length() - 1).append(";");
return;
} else {
command.append(cmdpart);
@@ -420,6 +423,7 @@ private String getProperty(Properties props, String[] keys) {
return null;
}
+ @Override
public boolean connect(Properties props) throws IOException {
String url =
getProperty(
@@ -465,7 +469,7 @@ public boolean connect(Properties props) throws IOException {
beeLine.info("Connecting to " + url);
if (Utils.parsePropertyFromUrl(url, AUTH_PRINCIPAL) == null
- || Utils.parsePropertyFromUrl(url, AUTH_KYUUBI_SERVER_PRINCIPAL) == null) {
+ && Utils.parsePropertyFromUrl(url, AUTH_KYUUBI_SERVER_PRINCIPAL) == null) {
String urlForPrompt = url.substring(0, url.contains(";") ? url.indexOf(';') : url.length());
if (username == null) {
username = beeLine.getConsoleReader().readLine("Enter username for " + urlForPrompt + ": ");
@@ -487,7 +491,16 @@ public boolean connect(Properties props) throws IOException {
if (!beeLine.isBeeLine()) {
beeLine.updateOptsForCli();
}
- beeLine.runInit();
+
+ // see HIVE-19048 : Initscript errors are ignored
+ int initScriptExecutionResult = beeLine.runInit();
+
+ // if execution of the init script(s) return anything other than ERRNO_OK from beeline
+ // exit beeline with error unless --force is set
+ if (initScriptExecutionResult != 0 && !beeLine.getOpts().getForce()) {
+ return beeLine.error("init script execution failed.");
+ }
+
if (beeLine.getOpts().getInitFiles() != null) {
beeLine.initializeConsoleReader(null);
}
@@ -505,12 +518,14 @@ public boolean connect(Properties props) throws IOException {
@Override
public String handleMultiLineCmd(String line) throws IOException {
- int[] startQuote = {-1};
Character mask =
(System.getProperty("jline.terminal", "").equals("jline.UnsupportedTerminal"))
? null
: jline.console.ConsoleReader.NULL_MASK;
+ if (!beeLine.isPythonMode()) {
+ line = HiveStringUtils.removeComments(line);
+ }
while (isMultiLine(line) && beeLine.getOpts().isAllowMultiLineCommand()) {
StringBuilder prompt = new StringBuilder(beeLine.getPrompt());
if (!beeLine.getOpts().isSilent()) {
@@ -536,6 +551,9 @@ public String handleMultiLineCmd(String line) throws IOException {
if (extra == null) { // it happens when using -f and the line of cmds does not end with ;
break;
}
+ if (!beeLine.isPythonMode()) {
+ extra = HiveStringUtils.removeComments(extra);
+ }
if (!extra.isEmpty()) {
line += "\n" + extra;
}
@@ -547,12 +565,13 @@ public String handleMultiLineCmd(String line) throws IOException {
// console. Used in handleMultiLineCmd method assumes line would never be null when this method is
// called
private boolean isMultiLine(String line) {
+ line = trimForNonPythonMode(line);
if (line.endsWith(beeLine.getOpts().getDelimiter()) || beeLine.isComment(line)) {
return false;
}
// handles the case like line = show tables; --test comment
List cmds = getCmdList(line, false);
- return cmds.isEmpty() || !cmds.get(cmds.size() - 1).startsWith("--");
+ return cmds.isEmpty() || !trimForNonPythonMode(cmds.get(cmds.size() - 1)).startsWith("--");
}
static class KyuubiLogRunnable implements Runnable {
diff --git a/kyuubi-hive-beeline/src/test/java/org/apache/hive/beeline/KyuubiBeeLineTest.java b/kyuubi-hive-beeline/src/test/java/org/apache/hive/beeline/KyuubiBeeLineTest.java
index b144c95c61f..9c7aec35a42 100644
--- a/kyuubi-hive-beeline/src/test/java/org/apache/hive/beeline/KyuubiBeeLineTest.java
+++ b/kyuubi-hive-beeline/src/test/java/org/apache/hive/beeline/KyuubiBeeLineTest.java
@@ -19,7 +19,12 @@
package org.apache.hive.beeline;
import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.io.PrintStream;
+import org.apache.kyuubi.util.reflect.DynFields;
import org.junit.Test;
public class KyuubiBeeLineTest {
@@ -29,4 +34,104 @@ public void testKyuubiBeelineWithoutArgs() {
int result = kyuubiBeeLine.initArgs(new String[0]);
assertEquals(0, result);
}
+
+ @Test
+ public void testKyuubiBeelineExitCodeWithoutConnection() {
+ KyuubiBeeLine kyuubiBeeLine = new KyuubiBeeLine();
+ String scriptFile = getClass().getClassLoader().getResource("test.sql").getFile();
+
+ String[] args1 = {"-u", "badUrl", "-e", "show tables"};
+ int result1 = kyuubiBeeLine.initArgs(args1);
+ assertEquals(1, result1);
+
+ String[] args2 = {"-u", "badUrl", "-f", scriptFile};
+ int result2 = kyuubiBeeLine.initArgs(args2);
+ assertEquals(1, result2);
+
+ String[] args3 = {"-u", "badUrl", "-i", scriptFile};
+ int result3 = kyuubiBeeLine.initArgs(args3);
+ assertEquals(1, result3);
+ }
+
+ @Test
+ public void testKyuubiBeeLineCmdUsage() {
+ BufferPrintStream printStream = new BufferPrintStream();
+
+ KyuubiBeeLine kyuubiBeeLine = new KyuubiBeeLine();
+ DynFields.builder()
+ .hiddenImpl(BeeLine.class, "outputStream")
+ .build(kyuubiBeeLine)
+ .set(printStream);
+ String[] args1 = {"-h"};
+ kyuubiBeeLine.initArgs(args1);
+ String output = printStream.getOutput();
+ assert output.contains("--python-mode Execute python code/script.");
+ }
+
+ @Test
+ public void testKyuubiBeeLinePythonMode() {
+ KyuubiBeeLine kyuubiBeeLine = new KyuubiBeeLine();
+ String[] args1 = {"-u", "badUrl", "--python-mode"};
+ kyuubiBeeLine.initArgs(args1);
+ assertTrue(kyuubiBeeLine.isPythonMode());
+ kyuubiBeeLine.setPythonMode(false);
+
+ String[] args2 = {"--python-mode", "-f", "test.sql"};
+ kyuubiBeeLine.initArgs(args2);
+ assertTrue(kyuubiBeeLine.isPythonMode());
+ assert kyuubiBeeLine.getOpts().getScriptFile().equals("test.sql");
+ kyuubiBeeLine.setPythonMode(false);
+
+ String[] args3 = {"-u", "badUrl"};
+ kyuubiBeeLine.initArgs(args3);
+ assertTrue(!kyuubiBeeLine.isPythonMode());
+ kyuubiBeeLine.setPythonMode(false);
+ }
+
+ @Test
+ public void testKyuubiBeelineComment() {
+ KyuubiBeeLine kyuubiBeeLine = new KyuubiBeeLine();
+ int result = kyuubiBeeLine.initArgsFromCliVars(new String[] {"-e", "--comment show database;"});
+ assertEquals(0, result);
+ result = kyuubiBeeLine.initArgsFromCliVars(new String[] {"-e", "--comment\n show database;"});
+ assertEquals(1, result);
+ result =
+ kyuubiBeeLine.initArgsFromCliVars(
+ new String[] {"-e", "--comment line 1 \n --comment line 2 \n show database;"});
+ assertEquals(1, result);
+ }
+
+ static class BufferPrintStream extends PrintStream {
+ public StringBuilder stringBuilder = new StringBuilder();
+
+ static OutputStream noOpOutputStream =
+ new OutputStream() {
+ @Override
+ public void write(int b) throws IOException {
+ // do nothing
+ }
+ };
+
+ public BufferPrintStream() {
+ super(noOpOutputStream);
+ }
+
+ public BufferPrintStream(OutputStream outputStream) {
+ super(noOpOutputStream);
+ }
+
+ @Override
+ public void println(String x) {
+ stringBuilder.append(x).append("\n");
+ }
+
+ @Override
+ public void print(String x) {
+ stringBuilder.append(x);
+ }
+
+ public String getOutput() {
+ return stringBuilder.toString();
+ }
+ }
}
diff --git a/kyuubi-hive-beeline/src/test/java/org/apache/hive/beeline/KyuubiCommandsTest.java b/kyuubi-hive-beeline/src/test/java/org/apache/hive/beeline/KyuubiCommandsTest.java
index ecb8d65f502..653d1b08f55 100644
--- a/kyuubi-hive-beeline/src/test/java/org/apache/hive/beeline/KyuubiCommandsTest.java
+++ b/kyuubi-hive-beeline/src/test/java/org/apache/hive/beeline/KyuubiCommandsTest.java
@@ -34,6 +34,7 @@ public void testParsePythonSnippets() throws IOException {
Mockito.when(reader.readLine()).thenReturn(pythonSnippets);
KyuubiBeeLine beeline = new KyuubiBeeLine();
+ beeline.setPythonMode(true);
beeline.setConsoleReader(reader);
KyuubiCommands commands = new KyuubiCommands(beeline);
String line = commands.handleMultiLineCmd(pythonSnippets);
@@ -42,4 +43,29 @@ public void testParsePythonSnippets() throws IOException {
assertEquals(cmdList.size(), 1);
assertEquals(cmdList.get(0), pythonSnippets);
}
+
+ @Test
+ public void testHandleMultiLineCmd() throws IOException {
+ ConsoleReader reader = Mockito.mock(ConsoleReader.class);
+ String snippets = "select 1;--comments1\nselect 2;--comments2";
+ Mockito.when(reader.readLine()).thenReturn(snippets);
+
+ KyuubiBeeLine beeline = new KyuubiBeeLine();
+ beeline.setConsoleReader(reader);
+ beeline.setPythonMode(false);
+ KyuubiCommands commands = new KyuubiCommands(beeline);
+ String line = commands.handleMultiLineCmd(snippets);
+ List cmdList = commands.getCmdList(line, false);
+ assertEquals(cmdList.size(), 2);
+ assertEquals(cmdList.get(0), "select 1");
+ assertEquals(cmdList.get(1), "\nselect 2");
+
+ // see HIVE-15820: comment at the head of beeline -e
+ snippets = "--comments1\nselect 2;--comments2";
+ Mockito.when(reader.readLine()).thenReturn(snippets);
+ line = commands.handleMultiLineCmd(snippets);
+ cmdList = commands.getCmdList(line, false);
+ assertEquals(cmdList.size(), 1);
+ assertEquals(cmdList.get(0), "select 2");
+ }
}
diff --git a/kyuubi-hive-beeline/src/test/resources/test.sql b/kyuubi-hive-beeline/src/test/resources/test.sql
new file mode 100644
index 00000000000..c7c3ee2f92b
--- /dev/null
+++ b/kyuubi-hive-beeline/src/test/resources/test.sql
@@ -0,0 +1,17 @@
+-- Licensed to the Apache Software Foundation (ASF) under one or more
+-- contributor license agreements. See the NOTICE file distributed with
+-- this work for additional information regarding copyright ownership.
+-- The ASF licenses this file to You under the Apache License, Version 2.0
+-- (the "License"); you may not use this file except in compliance with
+-- the License. You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+--
+
+show tables;
diff --git a/kyuubi-hive-jdbc-shaded/pom.xml b/kyuubi-hive-jdbc-shaded/pom.xml
index 1a6f258b02f..174f199bead 100644
--- a/kyuubi-hive-jdbc-shaded/pom.xml
+++ b/kyuubi-hive-jdbc-shaded/pom.xml
@@ -21,7 +21,7 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOTkyuubi-hive-jdbc-shaded
@@ -108,10 +108,6 @@
org.apache.commons${kyuubi.shade.packageName}.org.apache.commons
-
- org.apache.curator
- ${kyuubi.shade.packageName}.org.apache.curator
- org.apache.hive${kyuubi.shade.packageName}.org.apache.hive
@@ -120,18 +116,10 @@
org.apache.http${kyuubi.shade.packageName}.org.apache.http
-
- org.apache.jute
- ${kyuubi.shade.packageName}.org.apache.jute
- org.apache.thrift${kyuubi.shade.packageName}.org.apache.thrift
-
- org.apache.zookeeper
- ${kyuubi.shade.packageName}.org.apache.zookeeper
-
diff --git a/kyuubi-hive-jdbc/pom.xml b/kyuubi-hive-jdbc/pom.xml
index 36ea7acc274..aa5e7c161d5 100644
--- a/kyuubi-hive-jdbc/pom.xml
+++ b/kyuubi-hive-jdbc/pom.xml
@@ -21,7 +21,7 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOTkyuubi-hive-jdbc
@@ -35,6 +35,11 @@
+
+ org.apache.kyuubi
+ kyuubi-util
+ ${project.version}
+ org.apache.arrow
@@ -102,24 +107,14 @@
provided
-
- org.apache.curator
- curator-framework
-
-
-
- org.apache.curator
- curator-client
-
-
org.apache.httpcomponentshttpclient
- org.apache.zookeeper
- zookeeper
+ org.apache.kyuubi
+ ${kyuubi-shaded-zookeeper.artifacts}
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/KyuubiHiveDriver.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/KyuubiHiveDriver.java
index 3b874ba2e3a..66b797087e5 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/KyuubiHiveDriver.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/KyuubiHiveDriver.java
@@ -24,6 +24,7 @@
import java.util.jar.Attributes;
import java.util.jar.Manifest;
import java.util.logging.Logger;
+import org.apache.commons.lang3.StringUtils;
import org.apache.kyuubi.jdbc.hive.JdbcConnectionParams;
import org.apache.kyuubi.jdbc.hive.KyuubiConnection;
import org.apache.kyuubi.jdbc.hive.KyuubiSQLException;
@@ -137,7 +138,7 @@ private Properties parseURLForPropertyInfo(String url, Properties defaults) thro
host = "";
}
String port = Integer.toString(params.getPort());
- if (host.equals("")) {
+ if (StringUtils.isEmpty(host)) {
port = "";
} else if (port.equals("0") || port.equals("-1")) {
port = DEFAULT_PORT;
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiArrowQueryResultSet.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiArrowQueryResultSet.java
index fda70f463e9..54491b2d670 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiArrowQueryResultSet.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiArrowQueryResultSet.java
@@ -250,9 +250,6 @@ private void retrieveSchema() throws SQLException {
metadataResp = client.GetResultSetMetadata(metadataReq);
Utils.verifySuccess(metadataResp.getStatus());
- StringBuilder namesSb = new StringBuilder();
- StringBuilder typesSb = new StringBuilder();
-
TTableSchema schema = metadataResp.getSchema();
if (schema == null || !schema.isSetColumns()) {
// TODO: should probably throw an exception here.
@@ -262,10 +259,6 @@ private void retrieveSchema() throws SQLException {
List columns = schema.getColumns();
for (int pos = 0; pos < schema.getColumnsSize(); pos++) {
- if (pos != 0) {
- namesSb.append(",");
- typesSb.append(",");
- }
String columnName = columns.get(pos).getColumnName();
columnNames.add(columnName);
normalizedColumnNames.add(columnName.toLowerCase());
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiDatabaseMetaData.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiDatabaseMetaData.java
index f5e29f8e7d6..c6ab3a277c4 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiDatabaseMetaData.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiDatabaseMetaData.java
@@ -531,7 +531,7 @@ public ResultSet getProcedureColumns(
@Override
public String getProcedureTerm() throws SQLException {
- return new String("UDF");
+ return "UDF";
}
@Override
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiPreparedStatement.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiPreparedStatement.java
index a0d4f3bfd25..1e53f940157 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiPreparedStatement.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiPreparedStatement.java
@@ -168,7 +168,7 @@ public void setObject(int parameterIndex, Object x) throws SQLException {
// Can't infer a type.
throw new KyuubiSQLException(
MessageFormat.format(
- "Can't infer the SQL type to use for an instance of {0}. Use setObject() with an explicit Types value to specify the type to use.",
+ "Cannot infer the SQL type to use for an instance of {0}. Use setObject() with an explicit Types value to specify the type to use.",
x.getClass().getName()));
}
}
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiQueryResultSet.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiQueryResultSet.java
index f06ada5d4be..242ec772021 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiQueryResultSet.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiQueryResultSet.java
@@ -26,6 +26,7 @@
import org.apache.kyuubi.jdbc.hive.cli.RowSet;
import org.apache.kyuubi.jdbc.hive.cli.RowSetFactory;
import org.apache.kyuubi.jdbc.hive.common.HiveDecimal;
+import org.apache.thrift.TException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -47,6 +48,7 @@ public class KyuubiQueryResultSet extends KyuubiBaseResultSet {
private boolean emptyResultSet = false;
private boolean isScrollable = false;
private boolean fetchFirst = false;
+ private boolean hasMoreToFetch = false;
private final TProtocolVersion protocol;
@@ -223,9 +225,6 @@ private void retrieveSchema() throws SQLException {
metadataResp = client.GetResultSetMetadata(metadataReq);
Utils.verifySuccess(metadataResp.getStatus());
- StringBuilder namesSb = new StringBuilder();
- StringBuilder typesSb = new StringBuilder();
-
TTableSchema schema = metadataResp.getSchema();
if (schema == null || !schema.isSetColumns()) {
// TODO: should probably throw an exception here.
@@ -235,10 +234,6 @@ private void retrieveSchema() throws SQLException {
List columns = schema.getColumns();
for (int pos = 0; pos < schema.getColumnsSize(); pos++) {
- if (pos != 0) {
- namesSb.append(",");
- typesSb.append(",");
- }
String columnName = columns.get(pos).getColumnName();
columnNames.add(columnName);
normalizedColumnNames.add(columnName.toLowerCase());
@@ -324,25 +319,20 @@ public boolean next() throws SQLException {
try {
TFetchOrientation orientation = TFetchOrientation.FETCH_NEXT;
if (fetchFirst) {
- // If we are asked to start from begining, clear the current fetched resultset
+ // If we are asked to start from beginning, clear the current fetched resultset
orientation = TFetchOrientation.FETCH_FIRST;
fetchedRows = null;
fetchedRowsItr = null;
fetchFirst = false;
}
if (fetchedRows == null || !fetchedRowsItr.hasNext()) {
- TFetchResultsReq fetchReq = new TFetchResultsReq(stmtHandle, orientation, fetchSize);
- TFetchResultsResp fetchResp;
- fetchResp = client.FetchResults(fetchReq);
- Utils.verifySuccessWithInfo(fetchResp.getStatus());
-
- TRowSet results = fetchResp.getResults();
- fetchedRows = RowSetFactory.create(results, protocol);
- fetchedRowsItr = fetchedRows.iterator();
+ fetchResult(orientation);
}
if (fetchedRowsItr.hasNext()) {
row = fetchedRowsItr.next();
+ } else if (hasMoreToFetch) {
+ fetchResult(orientation);
} else {
return false;
}
@@ -357,6 +347,18 @@ public boolean next() throws SQLException {
return true;
}
+ private void fetchResult(TFetchOrientation orientation) throws SQLException, TException {
+ TFetchResultsReq fetchReq = new TFetchResultsReq(stmtHandle, orientation, fetchSize);
+ TFetchResultsResp fetchResp;
+ fetchResp = client.FetchResults(fetchReq);
+ Utils.verifySuccessWithInfo(fetchResp.getStatus());
+ hasMoreToFetch = fetchResp.isSetHasMoreRows() && fetchResp.isHasMoreRows();
+
+ TRowSet results = fetchResp.getResults();
+ fetchedRows = RowSetFactory.create(results, protocol);
+ fetchedRowsItr = fetchedRows.iterator();
+ }
+
@Override
public ResultSetMetaData getMetaData() throws SQLException {
if (isClosed) {
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiSQLException.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiSQLException.java
index 1ac0adf04ac..7d26f807898 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiSQLException.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/KyuubiSQLException.java
@@ -21,6 +21,7 @@
import java.util.ArrayList;
import java.util.List;
import org.apache.hive.service.rpc.thrift.TStatus;
+import org.apache.kyuubi.util.reflect.DynConstructors;
public class KyuubiSQLException extends SQLException {
@@ -186,7 +187,10 @@ private static Throwable toStackTrace(
private static Throwable newInstance(String className, String message) {
try {
- return (Throwable) Class.forName(className).getConstructor(String.class).newInstance(message);
+ return DynConstructors.builder()
+ .impl(className, String.class)
+ .buildChecked()
+ .newInstance(message);
} catch (Exception e) {
return new RuntimeException(className + ":" + message);
}
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/Utils.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/Utils.java
index ac9b29664c0..d0167e3e490 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/Utils.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/Utils.java
@@ -126,7 +126,7 @@ static List splitSqlStatement(String sql) {
break;
}
}
- parts.add(sql.substring(off, sql.length()));
+ parts.add(sql.substring(off));
return parts;
}
@@ -551,7 +551,10 @@ public static synchronized String getVersion() {
if (KYUUBI_CLIENT_VERSION == null) {
try {
Properties prop = new Properties();
- prop.load(Utils.class.getClassLoader().getResourceAsStream("version.properties"));
+ prop.load(
+ Utils.class
+ .getClassLoader()
+ .getResourceAsStream("org/apache/kyuubi/version.properties"));
KYUUBI_CLIENT_VERSION = prop.getProperty(KYUUBI_CLIENT_VERSION_KEY, "unknown");
} catch (Exception e) {
LOG.error("Error getting kyuubi client version", e);
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/ZooKeeperHiveClientHelper.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/ZooKeeperHiveClientHelper.java
index 41fadfa2f68..948fd333463 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/ZooKeeperHiveClientHelper.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/ZooKeeperHiveClientHelper.java
@@ -22,12 +22,12 @@
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
-import java.util.Random;
+import java.util.concurrent.ThreadLocalRandom;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
-import org.apache.curator.framework.CuratorFramework;
-import org.apache.curator.framework.CuratorFrameworkFactory;
-import org.apache.curator.retry.ExponentialBackoffRetry;
+import org.apache.kyuubi.shaded.curator.framework.CuratorFramework;
+import org.apache.kyuubi.shaded.curator.framework.CuratorFrameworkFactory;
+import org.apache.kyuubi.shaded.curator.retry.ExponentialBackoffRetry;
class ZooKeeperHiveClientHelper {
// Pattern for key1=value1;key2=value2
@@ -111,7 +111,7 @@ static void configureConnParams(JdbcConnectionParams connParams)
try (CuratorFramework zooKeeperClient = getZkClient(connParams)) {
List serverHosts = getServerHosts(connParams, zooKeeperClient);
// Now pick a server node randomly
- String serverNode = serverHosts.get(new Random().nextInt(serverHosts.size()));
+ String serverNode = serverHosts.get(ThreadLocalRandom.current().nextInt(serverHosts.size()));
updateParamsWithZKServerNode(connParams, zooKeeperClient, serverNode);
} catch (Exception e) {
throw new ZooKeeperHiveClientException(
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/auth/HttpKerberosRequestInterceptor.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/auth/HttpKerberosRequestInterceptor.java
index 278cef0b4a7..02d168c3f5b 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/auth/HttpKerberosRequestInterceptor.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/auth/HttpKerberosRequestInterceptor.java
@@ -65,7 +65,7 @@ protected void addHttpAuthHeader(HttpRequest httpRequest, HttpContext httpContex
httpRequest.addHeader(
HttpAuthUtils.AUTHORIZATION, HttpAuthUtils.NEGOTIATE + " " + kerberosAuthHeader);
} catch (Exception e) {
- throw new HttpException(e.getMessage(), e);
+ throw new HttpException(e.getMessage() == null ? "" : e.getMessage(), e);
} finally {
kerberosLock.unlock();
}
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/auth/HttpRequestInterceptorBase.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/auth/HttpRequestInterceptorBase.java
index 9ce5a330b7c..42641c219c9 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/auth/HttpRequestInterceptorBase.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/auth/HttpRequestInterceptorBase.java
@@ -110,7 +110,7 @@ public void process(HttpRequest httpRequest, HttpContext httpContext)
httpRequest.addHeader("Cookie", cookieHeaderKeyValues.toString());
}
} catch (Exception e) {
- throw new HttpException(e.getMessage(), e);
+ throw new HttpException(e.getMessage() == null ? "" : e.getMessage(), e);
}
}
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/cli/ColumnBuffer.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/cli/ColumnBuffer.java
index e703cb1f00c..bd5124f9524 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/cli/ColumnBuffer.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/cli/ColumnBuffer.java
@@ -228,8 +228,9 @@ public Object get(int index) {
return stringVars.get(index);
case BINARY_TYPE:
return binaryVars.get(index).array();
+ default:
+ return null;
}
- return null;
}
@Override
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/Date.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/Date.java
index 1b49c268a4b..720c7517f52 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/Date.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/Date.java
@@ -65,6 +65,7 @@ public String toString() {
return localDate.format(PRINT_FORMATTER);
}
+ @Override
public int hashCode() {
return localDate.hashCode();
}
@@ -164,6 +165,7 @@ public int getDayOfWeek() {
}
/** Return a copy of this object. */
+ @Override
public Object clone() {
// LocalDateTime is immutable.
return new Date(this.localDate);
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/FastHiveDecimalImpl.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/FastHiveDecimalImpl.java
index d3dba0f7b7a..65f17e73443 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/FastHiveDecimalImpl.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/FastHiveDecimalImpl.java
@@ -5182,7 +5182,6 @@ public static boolean fastRoundIntegerDown(
fastResult.fastIntegerDigitCount = 0;
fastResult.fastScale = 0;
} else {
- fastResult.fastSignum = 0;
fastResult.fastSignum = fastSignum;
fastResult.fastIntegerDigitCount = fastRawPrecision(fastResult);
fastResult.fastScale = 0;
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/Timestamp.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/Timestamp.java
index cdb6b10ce52..7e02835b748 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/Timestamp.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/Timestamp.java
@@ -95,6 +95,7 @@ public String toString() {
return localDateTime.format(PRINT_FORMATTER);
}
+ @Override
public int hashCode() {
return localDateTime.hashCode();
}
@@ -207,6 +208,7 @@ public int getDayOfWeek() {
}
/** Return a copy of this object. */
+ @Override
public Object clone() {
// LocalDateTime is immutable.
return new Timestamp(this.localDateTime);
diff --git a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/TimestampTZUtil.java b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/TimestampTZUtil.java
index a938e16889a..be16926cbe3 100644
--- a/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/TimestampTZUtil.java
+++ b/kyuubi-hive-jdbc/src/main/java/org/apache/kyuubi/jdbc/hive/common/TimestampTZUtil.java
@@ -98,7 +98,7 @@ private static String handleSingleDigitHourOffset(String s) {
Matcher matcher = SINGLE_DIGIT_PATTERN.matcher(s);
if (matcher.find()) {
int index = matcher.start() + 1;
- s = s.substring(0, index) + "0" + s.substring(index, s.length());
+ s = s.substring(0, index) + "0" + s.substring(index);
}
return s;
}
diff --git a/kyuubi-hive-jdbc/src/main/resources/version.properties b/kyuubi-hive-jdbc/src/main/resources/org/apache/kyuubi/version.properties
similarity index 100%
rename from kyuubi-hive-jdbc/src/main/resources/version.properties
rename to kyuubi-hive-jdbc/src/main/resources/org/apache/kyuubi/version.properties
diff --git a/kyuubi-hive-jdbc/src/test/java/org/apache/kyuubi/jdbc/hive/TestJdbcDriver.java b/kyuubi-hive-jdbc/src/test/java/org/apache/kyuubi/jdbc/hive/TestJdbcDriver.java
index 228ad00ee2d..efdf7309277 100644
--- a/kyuubi-hive-jdbc/src/test/java/org/apache/kyuubi/jdbc/hive/TestJdbcDriver.java
+++ b/kyuubi-hive-jdbc/src/test/java/org/apache/kyuubi/jdbc/hive/TestJdbcDriver.java
@@ -24,6 +24,7 @@
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
+import java.nio.file.Files;
import java.util.Arrays;
import java.util.Collection;
import org.junit.AfterClass;
@@ -67,14 +68,14 @@ public static Collection
+
+ org.apache.kyuubi
+ kyuubi-util
+ ${project.version}
+
+
org.slf4jslf4j-api
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/AdminRestApi.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/AdminRestApi.java
index c81af593ae4..e315a96cc56 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/AdminRestApi.java
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/AdminRestApi.java
@@ -23,6 +23,7 @@
import java.util.Map;
import org.apache.kyuubi.client.api.v1.dto.Engine;
import org.apache.kyuubi.client.api.v1.dto.OperationData;
+import org.apache.kyuubi.client.api.v1.dto.ServerData;
import org.apache.kyuubi.client.api.v1.dto.SessionData;
public class AdminRestApi {
@@ -46,11 +47,21 @@ public String refreshUserDefaultsConf() {
return this.getClient().post(path, null, client.getAuthHeader());
}
+ public String refreshKubernetesConf() {
+ String path = String.format("%s/%s", API_BASE_PATH, "refresh/kubernetes_conf");
+ return this.getClient().post(path, null, client.getAuthHeader());
+ }
+
public String refreshUnlimitedUsers() {
String path = String.format("%s/%s", API_BASE_PATH, "refresh/unlimited_users");
return this.getClient().post(path, null, client.getAuthHeader());
}
+ public String refreshDenyUsers() {
+ String path = String.format("%s/%s", API_BASE_PATH, "refresh/deny_users");
+ return this.getClient().post(path, null, client.getAuthHeader());
+ }
+
public String deleteEngine(
String engineType, String shareLevel, String subdomain, String hs2ProxyUser) {
Map params = new HashMap<>();
@@ -99,6 +110,13 @@ public String closeOperation(String operationHandleStr) {
return this.getClient().delete(url, null, client.getAuthHeader());
}
+ public List listServers() {
+ ServerData[] result =
+ this.getClient()
+ .get(API_BASE_PATH + "/server", null, ServerData[].class, client.getAuthHeader());
+ return Arrays.asList(result);
+ }
+
private IRestClient getClient() {
return this.client.getHttpClient();
}
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/BatchRestApi.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/BatchRestApi.java
index f5099568b21..7d113308df1 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/BatchRestApi.java
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/BatchRestApi.java
@@ -63,10 +63,23 @@ public GetBatchesResponse listBatches(
Long endTime,
int from,
int size) {
+ return listBatches(batchType, batchUser, batchState, null, createTime, endTime, from, size);
+ }
+
+ public GetBatchesResponse listBatches(
+ String batchType,
+ String batchUser,
+ String batchState,
+ String batchName,
+ Long createTime,
+ Long endTime,
+ int from,
+ int size) {
Map params = new HashMap<>();
params.put("batchType", batchType);
params.put("batchUser", batchUser);
params.put("batchState", batchState);
+ params.put("batchName", batchName);
if (null != createTime && createTime > 0) {
params.put("createTime", createTime);
}
@@ -102,8 +115,7 @@ private IRestClient getClient() {
private void setClientVersion(BatchRequest request) {
if (request != null) {
- Map newConf = new HashMap<>();
- newConf.putAll(request.getConf());
+ Map newConf = new HashMap<>(request.getConf());
newConf.put(VersionUtils.KYUUBI_CLIENT_VERSION_KEY, VersionUtils.getVersion());
request.setConf(newConf);
}
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/KyuubiRestClient.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/KyuubiRestClient.java
index dbcc89b16d3..c83eff7e0a3 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/KyuubiRestClient.java
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/KyuubiRestClient.java
@@ -30,6 +30,8 @@ public class KyuubiRestClient implements AutoCloseable, Cloneable {
private RestClientConf conf;
+ private List hostUrls;
+
private List baseUrls;
private ApiVersion version;
@@ -77,14 +79,20 @@ public void setHostUrls(List hostUrls) {
if (hostUrls.isEmpty()) {
throw new IllegalArgumentException("hostUrls cannot be blank.");
}
+ this.hostUrls = hostUrls;
List baseUrls = initBaseUrls(hostUrls, version);
this.httpClient = RetryableRestClient.getRestClient(baseUrls, this.conf);
}
+ public List getHostUrls() {
+ return hostUrls;
+ }
+
private KyuubiRestClient() {}
private KyuubiRestClient(Builder builder) {
this.version = builder.version;
+ this.hostUrls = builder.hostUrls;
this.baseUrls = initBaseUrls(builder.hostUrls, builder.version);
RestClientConf conf = new RestClientConf();
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/OperationRestApi.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/OperationRestApi.java
new file mode 100644
index 00000000000..ad659a5d463
--- /dev/null
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/OperationRestApi.java
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.client;
+
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.kyuubi.client.api.v1.dto.*;
+import org.apache.kyuubi.client.util.JsonUtils;
+
+public class OperationRestApi {
+
+ private KyuubiRestClient client;
+
+ private static final String API_BASE_PATH = "operations";
+
+ private OperationRestApi() {}
+
+ public OperationRestApi(KyuubiRestClient client) {
+ this.client = client;
+ }
+
+ public KyuubiOperationEvent getOperationEvent(String operationHandleStr) {
+ String path = String.format("%s/%s/event", API_BASE_PATH, operationHandleStr);
+ return this.getClient()
+ .get(path, new HashMap<>(), KyuubiOperationEvent.class, client.getAuthHeader());
+ }
+
+ public String applyOperationAction(OpActionRequest request, String operationHandleStr) {
+ String path = String.format("%s/%s", API_BASE_PATH, operationHandleStr);
+ return this.getClient().put(path, JsonUtils.toJson(request), client.getAuthHeader());
+ }
+
+ public ResultSetMetaData getResultSetMetadata(String operationHandleStr) {
+ String path = String.format("%s/%s/resultsetmetadata", API_BASE_PATH, operationHandleStr);
+ return this.getClient()
+ .get(path, new HashMap<>(), ResultSetMetaData.class, client.getAuthHeader());
+ }
+
+ public OperationLog getOperationLog(String operationHandleStr, int maxRows) {
+ String path = String.format("%s/%s/log", API_BASE_PATH, operationHandleStr);
+ Map params = new HashMap<>();
+ params.put("maxrows", maxRows);
+ return this.getClient().get(path, params, OperationLog.class, client.getAuthHeader());
+ }
+
+ public ResultRowSet getNextRowSet(String operationHandleStr) {
+ return getNextRowSet(operationHandleStr, null, null);
+ }
+
+ public ResultRowSet getNextRowSet(
+ String operationHandleStr, String fetchOrientation, Integer maxRows) {
+ String path = String.format("%s/%s/rowset", API_BASE_PATH, operationHandleStr);
+ Map params = new HashMap<>();
+ if (fetchOrientation != null) params.put("fetchorientation", fetchOrientation);
+ if (maxRows != null) params.put("maxrows", maxRows);
+ return this.getClient().get(path, params, ResultRowSet.class, client.getAuthHeader());
+ }
+
+ private IRestClient getClient() {
+ return this.client.getHttpClient();
+ }
+}
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/RestClient.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/RestClient.java
index 6447d547765..e6d1d967420 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/RestClient.java
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/RestClient.java
@@ -114,7 +114,7 @@ public T post(
contentBody = new FileBody((File) payload);
break;
default:
- throw new RuntimeException("Unsupported multi part type:" + multiPart);
+ throw new RuntimeException("Unsupported multi part type:" + multiPart.getType());
}
entityBuilder.addPart(s, contentBody);
});
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/RetryableRestClient.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/RetryableRestClient.java
index dcd052acae4..d13151c2e4c 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/RetryableRestClient.java
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/RetryableRestClient.java
@@ -22,7 +22,7 @@
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;
import java.util.List;
-import java.util.Random;
+import java.util.concurrent.ThreadLocalRandom;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.kyuubi.client.exception.RetryableKyuubiRestException;
import org.slf4j.Logger;
@@ -44,7 +44,7 @@ public class RetryableRestClient implements InvocationHandler {
private RetryableRestClient(List uris, RestClientConf conf) {
this.conf = conf;
this.uris = uris;
- this.currentUriIndex = new Random(System.currentTimeMillis()).nextInt(uris.size());
+ this.currentUriIndex = ThreadLocalRandom.current().nextInt(uris.size());
newRestClient();
}
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/Count.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/Count.java
new file mode 100644
index 00000000000..8f77ccd138d
--- /dev/null
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/Count.java
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.client.api.v1.dto;
+
+import java.util.Objects;
+import org.apache.commons.lang3.builder.ReflectionToStringBuilder;
+import org.apache.commons.lang3.builder.ToStringStyle;
+
+public class Count {
+ private Integer count;
+
+ public Count() {}
+
+ public Count(Integer count) {
+ this.count = count;
+ }
+
+ public Integer getCount() {
+ return count;
+ }
+
+ public void setCount(Integer count) {
+ this.count = count;
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (this == o) return true;
+ if (o == null || getClass() != o.getClass()) return false;
+ Count that = (Count) o;
+ return Objects.equals(getCount(), that.getCount());
+ }
+
+ @Override
+ public int hashCode() {
+ return Objects.hash(getCount());
+ }
+
+ @Override
+ public String toString() {
+ return ReflectionToStringBuilder.toString(this, ToStringStyle.JSON_STYLE);
+ }
+}
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiOperationEvent.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiOperationEvent.java
new file mode 100644
index 00000000000..13c40eecf78
--- /dev/null
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiOperationEvent.java
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.client.api.v1.dto;
+
+import java.util.Map;
+
+public class KyuubiOperationEvent {
+
+ private String statementId;
+
+ private String remoteId;
+
+ private String statement;
+
+ private boolean shouldRunAsync;
+
+ private String state;
+
+ private long eventTime;
+
+ private long createTime;
+
+ private long startTime;
+
+ private long completeTime;
+
+ private Throwable exception;
+
+ private String sessionId;
+
+ private String sessionUser;
+
+ private String sessionType;
+
+ private String kyuubiInstance;
+
+ private Map metrics;
+
+ public KyuubiOperationEvent() {}
+
+ public KyuubiOperationEvent(
+ String statementId,
+ String remoteId,
+ String statement,
+ boolean shouldRunAsync,
+ String state,
+ long eventTime,
+ long createTime,
+ long startTime,
+ long completeTime,
+ Throwable exception,
+ String sessionId,
+ String sessionUser,
+ String sessionType,
+ String kyuubiInstance,
+ Map metrics) {
+ this.statementId = statementId;
+ this.remoteId = remoteId;
+ this.statement = statement;
+ this.shouldRunAsync = shouldRunAsync;
+ this.state = state;
+ this.eventTime = eventTime;
+ this.createTime = createTime;
+ this.startTime = startTime;
+ this.completeTime = completeTime;
+ this.exception = exception;
+ this.sessionId = sessionId;
+ this.sessionUser = sessionUser;
+ this.sessionType = sessionType;
+ this.kyuubiInstance = kyuubiInstance;
+ this.metrics = metrics;
+ }
+
+ public static KyuubiOperationEvent.KyuubiOperationEventBuilder builder() {
+ return new KyuubiOperationEvent.KyuubiOperationEventBuilder();
+ }
+
+ public static class KyuubiOperationEventBuilder {
+ private String statementId;
+
+ private String remoteId;
+
+ private String statement;
+
+ private boolean shouldRunAsync;
+
+ private String state;
+
+ private long eventTime;
+
+ private long createTime;
+
+ private long startTime;
+
+ private long completeTime;
+
+ private Throwable exception;
+
+ private String sessionId;
+
+ private String sessionUser;
+
+ private String sessionType;
+
+ private String kyuubiInstance;
+
+ private Map metrics;
+
+ public KyuubiOperationEventBuilder() {}
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder statementId(final String statementId) {
+ this.statementId = statementId;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder remoteId(final String remoteId) {
+ this.remoteId = remoteId;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder statement(final String statement) {
+ this.statement = statement;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder shouldRunAsync(
+ final boolean shouldRunAsync) {
+ this.shouldRunAsync = shouldRunAsync;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder state(final String state) {
+ this.state = state;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder eventTime(final long eventTime) {
+ this.eventTime = eventTime;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder createTime(final long createTime) {
+ this.createTime = createTime;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder startTime(final long startTime) {
+ this.startTime = startTime;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder completeTime(final long completeTime) {
+ this.completeTime = completeTime;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder exception(final Throwable exception) {
+ this.exception = exception;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder sessionId(final String sessionId) {
+ this.sessionId = sessionId;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder sessionUser(final String sessionUser) {
+ this.sessionUser = sessionUser;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder sessionType(final String sessionType) {
+ this.sessionType = sessionType;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder kyuubiInstance(
+ final String kyuubiInstance) {
+ this.kyuubiInstance = kyuubiInstance;
+ return this;
+ }
+
+ public KyuubiOperationEvent.KyuubiOperationEventBuilder metrics(
+ final Map metrics) {
+ this.metrics = metrics;
+ return this;
+ }
+
+ public KyuubiOperationEvent build() {
+ return new KyuubiOperationEvent(
+ statementId,
+ remoteId,
+ statement,
+ shouldRunAsync,
+ state,
+ eventTime,
+ createTime,
+ startTime,
+ completeTime,
+ exception,
+ sessionId,
+ sessionUser,
+ sessionType,
+ kyuubiInstance,
+ metrics);
+ }
+ }
+
+ public String getStatementId() {
+ return statementId;
+ }
+
+ public void setStatementId(String statementId) {
+ this.statementId = statementId;
+ }
+
+ public String getRemoteId() {
+ return remoteId;
+ }
+
+ public void setRemoteId(String remoteId) {
+ this.remoteId = remoteId;
+ }
+
+ public String getStatement() {
+ return statement;
+ }
+
+ public void setStatement(String statement) {
+ this.statement = statement;
+ }
+
+ public boolean isShouldRunAsync() {
+ return shouldRunAsync;
+ }
+
+ public void setShouldRunAsync(boolean shouldRunAsync) {
+ this.shouldRunAsync = shouldRunAsync;
+ }
+
+ public String getState() {
+ return state;
+ }
+
+ public void setState(String state) {
+ this.state = state;
+ }
+
+ public long getEventTime() {
+ return eventTime;
+ }
+
+ public void setEventTime(long eventTime) {
+ this.eventTime = eventTime;
+ }
+
+ public long getCreateTime() {
+ return createTime;
+ }
+
+ public void setCreateTime(long createTime) {
+ this.createTime = createTime;
+ }
+
+ public long getStartTime() {
+ return startTime;
+ }
+
+ public void setStartTime(long startTime) {
+ this.startTime = startTime;
+ }
+
+ public long getCompleteTime() {
+ return completeTime;
+ }
+
+ public void setCompleteTime(long completeTime) {
+ this.completeTime = completeTime;
+ }
+
+ public Throwable getException() {
+ return exception;
+ }
+
+ public void setException(Throwable exception) {
+ this.exception = exception;
+ }
+
+ public String getSessionId() {
+ return sessionId;
+ }
+
+ public void setSessionId(String sessionId) {
+ this.sessionId = sessionId;
+ }
+
+ public String getSessionUser() {
+ return sessionUser;
+ }
+
+ public void setSessionUser(String sessionUser) {
+ this.sessionUser = sessionUser;
+ }
+
+ public String getSessionType() {
+ return sessionType;
+ }
+
+ public void setSessionType(String sessionType) {
+ this.sessionType = sessionType;
+ }
+
+ public String getKyuubiInstance() {
+ return kyuubiInstance;
+ }
+
+ public void setKyuubiInstance(String kyuubiInstance) {
+ this.kyuubiInstance = kyuubiInstance;
+ }
+
+ public Map getMetrics() {
+ return metrics;
+ }
+
+ public void setMetrics(Map metrics) {
+ this.metrics = metrics;
+ }
+}
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiSessionEvent.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiSessionEvent.java
index 4c3cbcfd540..34d306fedb9 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiSessionEvent.java
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/KyuubiSessionEvent.java
@@ -19,7 +19,7 @@
import java.util.Map;
-public class KyuubiSessionEvent implements KyuubiEvent {
+public class KyuubiSessionEvent {
private String sessionId;
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/OperationData.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/OperationData.java
index 1b99bb2c690..70c2dd3f3a1 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/OperationData.java
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/OperationData.java
@@ -17,6 +17,8 @@
package org.apache.kyuubi.client.api.v1.dto;
+import java.util.Collections;
+import java.util.Map;
import java.util.Objects;
import org.apache.commons.lang3.builder.ReflectionToStringBuilder;
import org.apache.commons.lang3.builder.ToStringStyle;
@@ -33,6 +35,7 @@ public class OperationData {
private String sessionUser;
private String sessionType;
private String kyuubiInstance;
+ private Map metrics;
public OperationData() {}
@@ -47,7 +50,8 @@ public OperationData(
String sessionId,
String sessionUser,
String sessionType,
- String kyuubiInstance) {
+ String kyuubiInstance,
+ Map metrics) {
this.identifier = identifier;
this.statement = statement;
this.state = state;
@@ -59,6 +63,7 @@ public OperationData(
this.sessionUser = sessionUser;
this.sessionType = sessionType;
this.kyuubiInstance = kyuubiInstance;
+ this.metrics = metrics;
}
public String getIdentifier() {
@@ -149,11 +154,22 @@ public void setKyuubiInstance(String kyuubiInstance) {
this.kyuubiInstance = kyuubiInstance;
}
+ public Map getMetrics() {
+ if (null == metrics) {
+ return Collections.emptyMap();
+ }
+ return metrics;
+ }
+
+ public void setMetrics(Map metrics) {
+ this.metrics = metrics;
+ }
+
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
- SessionData that = (SessionData) o;
+ OperationData that = (OperationData) o;
return Objects.equals(getIdentifier(), that.getIdentifier());
}
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/ServerData.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/ServerData.java
new file mode 100644
index 00000000000..7b68763d28b
--- /dev/null
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/api/v1/dto/ServerData.java
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.client.api.v1.dto;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.Objects;
+
+public class ServerData {
+ private String nodeName;
+ private String namespace;
+ private String instance;
+ private String host;
+ private int port;
+ private Map attributes;
+ private String status;
+
+ public ServerData() {}
+
+ public ServerData(
+ String nodeName,
+ String namespace,
+ String instance,
+ String host,
+ int port,
+ Map attributes,
+ String status) {
+ this.nodeName = nodeName;
+ this.namespace = namespace;
+ this.instance = instance;
+ this.host = host;
+ this.port = port;
+ this.attributes = attributes;
+ this.status = status;
+ }
+
+ public String getNodeName() {
+ return nodeName;
+ }
+
+ public ServerData setNodeName(String nodeName) {
+ this.nodeName = nodeName;
+ return this;
+ }
+
+ public String getNamespace() {
+ return namespace;
+ }
+
+ public ServerData setNamespace(String namespace) {
+ this.namespace = namespace;
+ return this;
+ }
+
+ public String getInstance() {
+ return instance;
+ }
+
+ public ServerData setInstance(String instance) {
+ this.instance = instance;
+ return this;
+ }
+
+ public String getHost() {
+ return host;
+ }
+
+ public ServerData setHost(String host) {
+ this.host = host;
+ return this;
+ }
+
+ public int getPort() {
+ return port;
+ }
+
+ public ServerData setPort(int port) {
+ this.port = port;
+ return this;
+ }
+
+ public Map getAttributes() {
+ if (null == attributes) {
+ return Collections.emptyMap();
+ }
+ return attributes;
+ }
+
+ public ServerData setAttributes(Map attributes) {
+ this.attributes = attributes;
+ return this;
+ }
+
+ public String getStatus() {
+ return status;
+ }
+
+ public ServerData setStatus(String status) {
+ this.status = status;
+ return this;
+ }
+
+ @Override
+ public int hashCode() {
+ return Objects.hash(nodeName, namespace, instance, port, attributes, status);
+ }
+
+ @Override
+ public boolean equals(Object obj) {
+ if (this == obj) return true;
+ if (obj == null || getClass() != obj.getClass()) return false;
+ ServerData server = (ServerData) obj;
+ return port == server.port
+ && Objects.equals(nodeName, server.nodeName)
+ && Objects.equals(namespace, server.namespace)
+ && Objects.equals(instance, server.instance)
+ && Objects.equals(host, server.host)
+ && Objects.equals(status, server.status);
+ }
+}
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/auth/SpnegoAuthHeaderGenerator.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/auth/SpnegoAuthHeaderGenerator.java
index 435a850142f..c66c6465ed1 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/auth/SpnegoAuthHeaderGenerator.java
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/auth/SpnegoAuthHeaderGenerator.java
@@ -17,13 +17,13 @@
package org.apache.kyuubi.client.auth;
-import java.lang.reflect.Field;
-import java.lang.reflect.Method;
import java.nio.charset.StandardCharsets;
import java.security.PrivilegedExceptionAction;
import java.util.Base64;
import javax.security.auth.Subject;
import org.apache.kyuubi.client.exception.KyuubiRestException;
+import org.apache.kyuubi.util.reflect.DynFields;
+import org.apache.kyuubi.util.reflect.DynMethods;
import org.ietf.jgss.GSSContext;
import org.ietf.jgss.GSSException;
import org.ietf.jgss.GSSManager;
@@ -61,13 +61,17 @@ public String generateAuthHeader() {
private String generateToken(String server) throws Exception {
Subject subject;
try {
- Class> ugiClz = Class.forName(UGI_CLASS);
- Method ugiGetCurrentUserMethod = ugiClz.getDeclaredMethod("getCurrentUser");
- Object ugiCurrentUser = ugiGetCurrentUserMethod.invoke(null);
+ Object ugiCurrentUser =
+ DynMethods.builder("getCurrentUser")
+ .hiddenImpl(Class.forName(UGI_CLASS))
+ .buildStaticChecked()
+ .invoke();
LOG.debug("The user credential is {}", ugiCurrentUser);
- Field ugiSubjectField = ugiCurrentUser.getClass().getDeclaredField("subject");
- ugiSubjectField.setAccessible(true);
- subject = (Subject) ugiSubjectField.get(ugiCurrentUser);
+ subject =
+ DynFields.builder()
+ .hiddenImpl(ugiCurrentUser.getClass(), "subject")
+ .buildChecked(ugiCurrentUser)
+ .get();
} catch (ClassNotFoundException e) {
// TODO do kerberos authentication using JDK class directly
LOG.error("Hadoop UGI class {} is required for SPNEGO authentication.", UGI_CLASS);
diff --git a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/util/VersionUtils.java b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/util/VersionUtils.java
index bcabca5b9f8..1f8cedf4b0e 100644
--- a/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/util/VersionUtils.java
+++ b/kyuubi-rest-client/src/main/java/org/apache/kyuubi/client/util/VersionUtils.java
@@ -31,7 +31,10 @@ public static synchronized String getVersion() {
if (KYUUBI_CLIENT_VERSION == null) {
try {
Properties prop = new Properties();
- prop.load(VersionUtils.class.getClassLoader().getResourceAsStream("version.properties"));
+ prop.load(
+ VersionUtils.class
+ .getClassLoader()
+ .getResourceAsStream("org/apache/kyuubi/version.properties"));
KYUUBI_CLIENT_VERSION = prop.getProperty(KYUUBI_CLIENT_VERSION_KEY, "unknown");
} catch (Exception e) {
LOG.error("Error getting kyuubi client version", e);
diff --git a/kyuubi-rest-client/src/main/resources/version.properties b/kyuubi-rest-client/src/main/resources/org/apache/kyuubi/version.properties
similarity index 100%
rename from kyuubi-rest-client/src/main/resources/version.properties
rename to kyuubi-rest-client/src/main/resources/org/apache/kyuubi/version.properties
diff --git a/kyuubi-server/pom.xml b/kyuubi-server/pom.xml
index 7408ac5dd00..a8b133d2792 100644
--- a/kyuubi-server/pom.xml
+++ b/kyuubi-server/pom.xml
@@ -21,10 +21,10 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT
- kyuubi-server_2.12
+ kyuubi-server_${scala.binary.version}jarKyuubi Project Serverhttps://kyuubi.apache.org/
@@ -252,11 +252,21 @@
derby
+
+ org.xerial
+ sqlite-jdbc
+
+
io.trinotrino-client
+
+ org.eclipse.jetty
+ jetty-proxy
+
+
org.glassfish.jersey.test-frameworkjersey-test-framework-core
@@ -395,6 +405,23 @@
swagger-ui
+
+ org.apache.kafka
+ kafka-clients
+
+
+
+ com.dimafeng
+ testcontainers-scala-scalatest_${scala.binary.version}
+ test
+
+
+
+ com.dimafeng
+ testcontainers-scala-kafka_${scala.binary.version}
+ test
+
+
org.apache.hivehive-exec
@@ -427,42 +454,6 @@
test
-
- org.apache.spark
- spark-avro_${scala.binary.version}
- test
-
-
-
- org.apache.parquet
- parquet-avro
- test
-
-
-
- org.apache.hudi
- hudi-common
- test
-
-
-
- org.apache.hudi
- hudi-spark-common_${scala.binary.version}
- test
-
-
-
- org.apache.hudi
- hudi-spark_${scala.binary.version}
- test
-
-
-
- org.apache.hudi
- hudi-spark3.1.x_${scala.binary.version}
- test
-
-
io.deltadelta-core_${scala.binary.version}
@@ -495,7 +486,7 @@
org.scalatestplus
- mockito-4-6_${scala.binary.version}
+ mockito-4-11_${scala.binary.version}test
diff --git a/kyuubi-server/src/main/resources/sql/derby/003-KYUUBI-5078.derby.sql b/kyuubi-server/src/main/resources/sql/derby/003-KYUUBI-5078.derby.sql
new file mode 100644
index 00000000000..dfdfe6069d0
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/derby/003-KYUUBI-5078.derby.sql
@@ -0,0 +1 @@
+ALTER TABLE metadata ALTER COLUMN kyuubi_instance DROP NOT NULL;
diff --git a/kyuubi-server/src/main/resources/sql/derby/004-KYUUBI-5131.derby.sql b/kyuubi-server/src/main/resources/sql/derby/004-KYUUBI-5131.derby.sql
new file mode 100644
index 00000000000..6a3142ffd3d
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/derby/004-KYUUBI-5131.derby.sql
@@ -0,0 +1 @@
+CREATE INDEX metadata_create_time_index ON metadata(create_time);
diff --git a/kyuubi-server/src/main/resources/sql/derby/metadata-store-schema-1.8.0.derby.sql b/kyuubi-server/src/main/resources/sql/derby/metadata-store-schema-1.8.0.derby.sql
new file mode 100644
index 00000000000..8d333bda2bd
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/derby/metadata-store-schema-1.8.0.derby.sql
@@ -0,0 +1,38 @@
+-- Derby does not support `CREATE TABLE IF NOT EXISTS`
+
+-- the metadata table ddl
+
+CREATE TABLE metadata(
+ key_id bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, -- the auto increment key id
+ identifier varchar(36) NOT NULL, -- the identifier id, which is an UUID
+ session_type varchar(32) NOT NULL, -- the session type, SQL or BATCH
+ real_user varchar(255) NOT NULL, -- the real user
+ user_name varchar(255) NOT NULL, -- the user name, might be a proxy user
+ ip_address varchar(128), -- the client ip address
+ kyuubi_instance varchar(1024), -- the kyuubi instance that creates this
+ state varchar(128) NOT NULL, -- the session state
+ resource varchar(1024), -- the main resource
+ class_name varchar(1024), -- the main class name
+ request_name varchar(1024), -- the request name
+ request_conf clob, -- the request config map
+ request_args clob, -- the request arguments
+ create_time BIGINT NOT NULL, -- the metadata create time
+ engine_type varchar(32) NOT NULL, -- the engine type
+ cluster_manager varchar(128), -- the engine cluster manager
+ engine_open_time bigint, -- the engine open time
+ engine_id varchar(128), -- the engine application id
+ engine_name clob, -- the engine application name
+ engine_url varchar(1024), -- the engine tracking url
+ engine_state varchar(32), -- the engine application state
+ engine_error clob, -- the engine application diagnose
+ end_time bigint, -- the metadata end time
+ peer_instance_closed boolean default FALSE -- closed by peer kyuubi instance
+);
+
+CREATE UNIQUE INDEX metadata_unique_identifier_index ON metadata(identifier);
+
+CREATE INDEX metadata_user_name_index ON metadata(user_name);
+
+CREATE INDEX metadata_engine_type_index ON metadata(engine_type);
+
+CREATE INDEX metadata_create_time_index ON metadata(create_time);
diff --git a/kyuubi-server/src/main/resources/sql/derby/upgrade-1.7.0-to-1.8.0.derby.sql b/kyuubi-server/src/main/resources/sql/derby/upgrade-1.7.0-to-1.8.0.derby.sql
new file mode 100644
index 00000000000..234510665f8
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/derby/upgrade-1.7.0-to-1.8.0.derby.sql
@@ -0,0 +1,2 @@
+RUN '003-KYUUBI-5078.derby.sql';
+RUN '004-KYUUBI-5131.derby.sql';
diff --git a/kyuubi-server/src/main/resources/sql/mysql/003-KYUUBI-5078.mysql.sql b/kyuubi-server/src/main/resources/sql/mysql/003-KYUUBI-5078.mysql.sql
new file mode 100644
index 00000000000..1d730cd4cf2
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/mysql/003-KYUUBI-5078.mysql.sql
@@ -0,0 +1,3 @@
+SELECT '< KYUUBI-5078: Make kyuubi_instance nullable in metadata table schema' AS ' ';
+
+ALTER TABLE metadata MODIFY kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this';
diff --git a/kyuubi-server/src/main/resources/sql/mysql/004-KYUUBI-5131.mysql.sql b/kyuubi-server/src/main/resources/sql/mysql/004-KYUUBI-5131.mysql.sql
new file mode 100644
index 00000000000..e743fc3d73e
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/mysql/004-KYUUBI-5131.mysql.sql
@@ -0,0 +1,3 @@
+SELECT '< KYUUBI-5131: Create index on metastore.create_time' AS ' ';
+
+ALTER TABLE metadata ADD INDEX create_time_index(create_time);
diff --git a/kyuubi-server/src/main/resources/sql/mysql/metadata-store-schema-1.8.0.mysql.sql b/kyuubi-server/src/main/resources/sql/mysql/metadata-store-schema-1.8.0.mysql.sql
new file mode 100644
index 00000000000..77df8fa0562
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/mysql/metadata-store-schema-1.8.0.mysql.sql
@@ -0,0 +1,32 @@
+-- the metadata table ddl
+
+CREATE TABLE IF NOT EXISTS metadata(
+ key_id bigint PRIMARY KEY AUTO_INCREMENT COMMENT 'the auto increment key id',
+ identifier varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
+ session_type varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
+ real_user varchar(255) NOT NULL COMMENT 'the real user',
+ user_name varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
+ ip_address varchar(128) COMMENT 'the client ip address',
+ kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this',
+ state varchar(128) NOT NULL COMMENT 'the session state',
+ resource varchar(1024) COMMENT 'the main resource',
+ class_name varchar(1024) COMMENT 'the main class name',
+ request_name varchar(1024) COMMENT 'the request name',
+ request_conf mediumtext COMMENT 'the request config map',
+ request_args mediumtext COMMENT 'the request arguments',
+ create_time BIGINT NOT NULL COMMENT 'the metadata create time',
+ engine_type varchar(32) NOT NULL COMMENT 'the engine type',
+ cluster_manager varchar(128) COMMENT 'the engine cluster manager',
+ engine_open_time bigint COMMENT 'the engine open time',
+ engine_id varchar(128) COMMENT 'the engine application id',
+ engine_name mediumtext COMMENT 'the engine application name',
+ engine_url varchar(1024) COMMENT 'the engine tracking url',
+ engine_state varchar(32) COMMENT 'the engine application state',
+ engine_error mediumtext COMMENT 'the engine application diagnose',
+ end_time bigint COMMENT 'the metadata end time',
+ peer_instance_closed boolean default '0' COMMENT 'closed by peer kyuubi instance',
+ UNIQUE INDEX unique_identifier_index(identifier),
+ INDEX user_name_index(user_name),
+ INDEX engine_type_index(engine_type),
+ INDEX create_time_index(create_time)
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
diff --git a/kyuubi-server/src/main/resources/sql/mysql/upgrade-1.7.0-to-1.8.0.mysql.sql b/kyuubi-server/src/main/resources/sql/mysql/upgrade-1.7.0-to-1.8.0.mysql.sql
new file mode 100644
index 00000000000..473997448ba
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/mysql/upgrade-1.7.0-to-1.8.0.mysql.sql
@@ -0,0 +1,4 @@
+SELECT '< Upgrading MetaStore schema from 1.7.0 to 1.8.0 >' AS ' ';
+SOURCE 003-KYUUBI-5078.mysql.sql;
+SOURCE 004-KYUUBI-5131.mysql.sql;
+SELECT '< Finished upgrading MetaStore schema from 1.7.0 to 1.8.0 >' AS ' ';
diff --git a/kyuubi-server/src/main/resources/sql/sqlite/README b/kyuubi-server/src/main/resources/sql/sqlite/README
new file mode 100644
index 00000000000..de15931f552
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/sqlite/README
@@ -0,0 +1,82 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+Kyuubi MetaStore Upgrade HowTo
+==============================
+
+This document describes how to upgrade the schema of a SQLite backed
+Kyuubi MetaStore instance from one release version of Kyuubi to another
+release version of Kyuubi. For example, by following the steps listed
+below it is possible to upgrade a Kyuubi 1.8.0 MetaStore schema to a
+Kyuubi 1.9.0 MetaStore schema. Before attempting this project we
+strongly recommend that you read through all of the steps in this
+document and familiarize yourself with the required tools.
+
+MetaStore Upgrade Steps
+=======================
+
+1) Shutdown your MetaStore instance and restrict access to the
+ MetaStore's SQLite database. It is very important that no one else
+ accesses or modifies the contents of database while you are
+ performing the schema upgrade.
+
+2) Create a backup of your SQLite metastore database. This will allow
+ you to revert any changes made during the upgrade process if
+ something goes wrong. The `sqlite3` command is the easiest way to
+ create a backup of a SQLite database:
+
+ % sqlite3 .db '.backup _backup.db'
+
+3) Dump your metastore database schema to a file. We use the `sqlite3`
+ utility again, but this time with a command line option that
+ specifies we are only interested in dumping the DDL statements
+ required to create the schema:
+
+ % sqlite3 .db '.schema' > schema-x.y.z.sqlite.sql
+
+4) The schema upgrade scripts assume that the schema you are upgrading
+ closely matches the official schema for your particular version of
+ Kyuubi. The files in this directory with names like
+ "metadata-store-schema-x.y.z.sqlite.sql" contain dumps of the official schemas
+ corresponding to each of the released versions of Kyuubi. You can
+ determine differences between your schema and the official schema
+ by diffing the contents of the official dump with the schema dump
+ you created in the previous step. Some differences are acceptable
+ and will not interfere with the upgrade process, but others need to
+ be resolved manually or the upgrade scripts will fail to complete.
+
+5) You are now ready to run the schema upgrade scripts. If you are
+ upgrading from Kyuubi 1.8.0 to Kyuubi 1.9.0 you need to run the
+ upgrade-1.8.0-to-1.9.0.sqlite.sql script, but if you are upgrading
+ from 1.8.0 to 2.0.0 you will need to run the 1.8.0 to 1.9.0 upgrade
+ script followed by the 1.9.0 to 2.0.0 upgrade script.
+
+ % sqlite3 .db
+ sqlite> .read upgrade-1.8.0-to-1.9.0.sqlite.sql
+ sqlite> .read upgrade-1.9.0-to-2.0.0.sqlite.sql
+
+ These scripts should run to completion without any errors. If you
+ do encounter errors you need to analyze the cause and attempt to
+ trace it back to one of the preceding steps.
+
+6) The final step of the upgrade process is validating your freshly
+ upgraded schema against the official schema for your particular
+ version of Kyuubi. This is accomplished by repeating steps (3) and
+ (4), but this time comparing against the official version of the
+ upgraded schema, e.g. if you upgraded the schema to Kyuubi 1.9.0 then
+ you will want to compare your schema dump against the contents of
+ metadata-store-schema-1.9.0.sqlite.sql
diff --git a/kyuubi-server/src/main/resources/sql/sqlite/metadata-store-schema-1.8.0.sqlite.sql b/kyuubi-server/src/main/resources/sql/sqlite/metadata-store-schema-1.8.0.sqlite.sql
new file mode 100644
index 00000000000..656de6e5d62
--- /dev/null
+++ b/kyuubi-server/src/main/resources/sql/sqlite/metadata-store-schema-1.8.0.sqlite.sql
@@ -0,0 +1,36 @@
+-- the metadata table ddl
+
+CREATE TABLE IF NOT EXISTS metadata(
+ key_id INTEGER PRIMARY KEY AUTOINCREMENT, -- the auto increment key id
+ identifier varchar(36) NOT NULL, -- the identifier id, which is an UUID
+ session_type varchar(32) NOT NULL, -- the session type, SQL or BATCH
+ real_user varchar(255) NOT NULL, -- the real user
+ user_name varchar(255) NOT NULL, -- the user name, might be a proxy user
+ ip_address varchar(128), -- the client ip address
+ kyuubi_instance varchar(1024), -- the kyuubi instance that creates this
+ state varchar(128) NOT NULL, -- the session state
+ resource varchar(1024), -- the main resource
+ class_name varchar(1024), -- the main class name
+ request_name varchar(1024), -- the request name
+ request_conf mediumtext, -- the request config map
+ request_args mediumtext, -- the request arguments
+ create_time BIGINT NOT NULL, -- the metadata create time
+ engine_type varchar(32) NOT NULL, -- the engine type
+ cluster_manager varchar(128), -- the engine cluster manager
+ engine_open_time bigint, -- the engine open time
+ engine_id varchar(128), -- the engine application id
+ engine_name mediumtext, -- the engine application name
+ engine_url varchar(1024), -- the engine tracking url
+ engine_state varchar(32), -- the engine application state
+ engine_error mediumtext, -- the engine application diagnose
+ end_time bigint, -- the metadata end time
+ peer_instance_closed boolean default '0' -- closed by peer kyuubi instance
+);
+
+CREATE UNIQUE INDEX IF NOT EXISTS metadata_unique_identifier_index ON metadata(identifier);
+
+CREATE INDEX IF NOT EXISTS metadata_user_name_index ON metadata(user_name);
+
+CREATE INDEX IF NOT EXISTS metadata_engine_type_index ON metadata(engine_type);
+
+CREATE INDEX IF NOT EXISTS metadata_create_time_index ON metadata(create_time);
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/client/KyuubiSyncThriftClient.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/client/KyuubiSyncThriftClient.scala
index 8b8561fa99f..ad7191c090c 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/client/KyuubiSyncThriftClient.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/client/KyuubiSyncThriftClient.scala
@@ -52,6 +52,8 @@ class KyuubiSyncThriftClient private (
@volatile private var _engineUrl: Option[String] = _
@volatile private var _engineName: Option[String] = _
+ private[kyuubi] def engineConnectionClosed: Boolean = !protocol.getTransport.isOpen
+
private val lock = new ReentrantLock()
// Visible for testing.
@@ -59,12 +61,14 @@ class KyuubiSyncThriftClient private (
@volatile private var _aliveProbeSessionHandle: TSessionHandle = _
@volatile private var remoteEngineBroken: Boolean = false
- @volatile private var clientClosedOnEngineBroken: Boolean = false
+ @volatile private var clientClosedByAliveProbe: Boolean = false
private val engineAliveProbeClient = engineAliveProbeProtocol.map(new TCLIService.Client(_))
private var engineAliveThreadPool: ScheduledExecutorService = _
@volatile private var engineLastAlive: Long = _
- private var asyncRequestExecutor: ExecutorService = _
+ private lazy val asyncRequestExecutor: ExecutorService =
+ ThreadUtils.newDaemonSingleThreadScheduledExecutor(
+ "async-request-executor-" + SessionHandle(_remoteSessionHandle))
@VisibleForTesting
@volatile private[kyuubi] var asyncRequestInterrupted: Boolean = false
@@ -72,11 +76,6 @@ class KyuubiSyncThriftClient private (
@VisibleForTesting
private[kyuubi] def getEngineAliveProbeProtocol: Option[TProtocol] = engineAliveProbeProtocol
- private def newAsyncRequestExecutor(): ExecutorService = {
- ThreadUtils.newDaemonSingleThreadScheduledExecutor(
- "async-request-executor-" + _remoteSessionHandle)
- }
-
private def shutdownAsyncRequestExecutor(): Unit = {
Option(asyncRequestExecutor).filterNot(_.isShutdown).foreach(ThreadUtils.shutdown(_))
asyncRequestInterrupted = true
@@ -87,7 +86,7 @@ class KyuubiSyncThriftClient private (
"engine-alive-probe-" + _aliveProbeSessionHandle)
val task = new Runnable {
override def run(): Unit = {
- if (!remoteEngineBroken) {
+ if (!remoteEngineBroken && !engineConnectionClosed) {
engineAliveProbeClient.foreach { client =>
val tGetInfoReq = new TGetInfoReq()
tGetInfoReq.setSessionHandle(_aliveProbeSessionHandle)
@@ -109,7 +108,6 @@ class KyuubiSyncThriftClient private (
}
}
} else {
- shutdownAsyncRequestExecutor()
warn(s"Removing Clients for ${_remoteSessionHandle}")
Seq(protocol).union(engineAliveProbeProtocol.toSeq).foreach { tProtocol =>
Utils.tryLogNonFatalError {
@@ -117,10 +115,11 @@ class KyuubiSyncThriftClient private (
tProtocol.getTransport.close()
}
}
- clientClosedOnEngineBroken = true
- Option(engineAliveThreadPool).foreach { pool =>
- ThreadUtils.shutdown(pool, Duration(engineAliveProbeInterval, TimeUnit.MILLISECONDS))
- }
+ }
+ clientClosedByAliveProbe = true
+ shutdownAsyncRequestExecutor()
+ Option(engineAliveThreadPool).foreach { pool =>
+ ThreadUtils.shutdown(pool, Duration(engineAliveProbeInterval, TimeUnit.MILLISECONDS))
}
}
}
@@ -136,19 +135,16 @@ class KyuubiSyncThriftClient private (
/**
* Lock every rpc call to send them sequentially
*/
- private def withLockAcquired[T](block: => T): T = {
- try {
- lock.lock()
- if (!protocol.getTransport.isOpen) {
- throw KyuubiSQLException.connectionDoesNotExist()
- }
- block
- } finally lock.unlock()
+ private def withLockAcquired[T](block: => T): T = Utils.withLockRequired(lock) {
+ if (engineConnectionClosed) {
+ throw KyuubiSQLException.connectionDoesNotExist()
+ }
+ block
}
private def withLockAcquiredAsyncRequest[T](block: => T): T = withLockAcquired {
- if (asyncRequestExecutor == null || asyncRequestExecutor.isShutdown) {
- asyncRequestExecutor = newAsyncRequestExecutor()
+ if (asyncRequestExecutor.isShutdown) {
+ throw KyuubiSQLException.connectionDoesNotExist()
}
val task = asyncRequestExecutor.submit(() => {
@@ -212,7 +208,7 @@ class KyuubiSyncThriftClient private (
}
def closeSession(): Unit = {
- if (clientClosedOnEngineBroken) return
+ if (clientClosedByAliveProbe) return
try {
if (_remoteSessionHandle != null) {
val req = new TCloseSessionReq(_remoteSessionHandle)
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/credentials/HadoopCredentialsManager.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/credentials/HadoopCredentialsManager.scala
index fe710e67839..b51255b716f 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/credentials/HadoopCredentialsManager.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/credentials/HadoopCredentialsManager.scala
@@ -17,13 +17,11 @@
package org.apache.kyuubi.credentials
-import java.util.ServiceLoader
import java.util.concurrent._
import scala.collection.JavaConverters._
import scala.collection.mutable
-import scala.concurrent.Future
-import scala.concurrent.Promise
+import scala.concurrent.{Future, Promise}
import scala.concurrent.duration.Duration
import scala.util.{Failure, Success, Try}
@@ -35,6 +33,7 @@ import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.config.KyuubiConf._
import org.apache.kyuubi.service.AbstractService
import org.apache.kyuubi.util.{KyuubiHadoopUtils, ThreadUtils}
+import org.apache.kyuubi.util.reflect.ReflectUtils._
/**
* [[HadoopCredentialsManager]] manages and renews delegation tokens, which are used by SQL engines
@@ -315,13 +314,10 @@ object HadoopCredentialsManager extends Logging {
private val providerEnabledConfig = "kyuubi.credentials.%s.enabled"
def loadProviders(kyuubiConf: KyuubiConf): Map[String, HadoopDelegationTokenProvider] = {
- val loader =
- ServiceLoader.load(
- classOf[HadoopDelegationTokenProvider],
- Utils.getContextOrKyuubiClassLoader)
val providers = mutable.ArrayBuffer[HadoopDelegationTokenProvider]()
- val iterator = loader.iterator
+ val iterator =
+ loadFromServiceLoader[HadoopDelegationTokenProvider](Utils.getContextOrKyuubiClassLoader)
while (iterator.hasNext) {
try {
providers += iterator.next
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ApplicationOperation.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ApplicationOperation.scala
index a2b3d0f7616..23a49c1ae5f 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ApplicationOperation.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ApplicationOperation.scala
@@ -35,31 +35,48 @@ trait ApplicationOperation {
/**
* Called before other method to do a quick skip
*
- * @param clusterManager the underlying cluster manager or just local instance
+ * @param appMgrInfo the application manager information
*/
- def isSupported(clusterManager: Option[String]): Boolean
+ def isSupported(appMgrInfo: ApplicationManagerInfo): Boolean
/**
* Kill the app/engine by the unique application tag
*
+ * @param appMgrInfo the application manager information
* @param tag the unique application tag for engine instance.
* For example,
* if the Hadoop Yarn is used, for spark applications,
* the tag will be preset via spark.yarn.tags
+ * @param proxyUser the proxy user to use for executing kill commands.
+ * For secured YARN cluster, the Kyuubi Server's user typically
+ * has no permission to kill the application. Admin user or
+ * application owner should be used instead.
* @return a message contains response describing how the kill process.
*
* @note For implementations, please suppress exceptions and always return KillResponse
*/
- def killApplicationByTag(tag: String): KillResponse
+ def killApplicationByTag(
+ appMgrInfo: ApplicationManagerInfo,
+ tag: String,
+ proxyUser: Option[String] = None): KillResponse
/**
* Get the engine/application status by the unique application tag
*
+ * @param appMgrInfo the application manager information
* @param tag the unique application tag for engine instance.
* @param submitTime engine submit to resourceManager time
+ * @param proxyUser the proxy user to use for creating YARN client
+ * For secured YARN cluster, the Kyuubi Server's user may have no permission
+ * to operate the application. Admin user or application owner could be used
+ * instead.
* @return [[ApplicationInfo]]
*/
- def getApplicationInfoByTag(tag: String, submitTime: Option[Long] = None): ApplicationInfo
+ def getApplicationInfoByTag(
+ appMgrInfo: ApplicationManagerInfo,
+ tag: String,
+ proxyUser: Option[String] = None,
+ submitTime: Option[Long] = None): ApplicationInfo
}
object ApplicationState extends Enumeration {
@@ -108,3 +125,22 @@ object ApplicationInfo {
object ApplicationOperation {
val NOT_FOUND = "APPLICATION_NOT_FOUND"
}
+
+case class KubernetesInfo(context: Option[String] = None, namespace: Option[String] = None)
+
+case class ApplicationManagerInfo(
+ resourceManager: Option[String],
+ kubernetesInfo: KubernetesInfo = KubernetesInfo())
+
+object ApplicationManagerInfo {
+ final val DEFAULT_KUBERNETES_NAMESPACE = "default"
+
+ def apply(
+ resourceManager: Option[String],
+ kubernetesContext: Option[String],
+ kubernetesNamespace: Option[String]): ApplicationManagerInfo = {
+ new ApplicationManagerInfo(
+ resourceManager,
+ KubernetesInfo(kubernetesContext, kubernetesNamespace))
+ }
+}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/EngineRef.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/EngineRef.scala
index 63b37f1c5d8..6122a6f138f 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/EngineRef.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/EngineRef.scala
@@ -17,7 +17,7 @@
package org.apache.kyuubi.engine
-import java.util.concurrent.TimeUnit
+import java.util.concurrent.{Semaphore, TimeUnit}
import scala.collection.JavaConverters._
import scala.util.Random
@@ -43,6 +43,7 @@ import org.apache.kyuubi.metrics.MetricsConstants.{ENGINE_FAIL, ENGINE_TIMEOUT,
import org.apache.kyuubi.metrics.MetricsSystem
import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.plugin.GroupProvider
+import org.apache.kyuubi.server.KyuubiServer
/**
* The description and functionality of an engine at server side
@@ -56,7 +57,8 @@ private[kyuubi] class EngineRef(
user: String,
groupProvider: GroupProvider,
engineRefId: String,
- engineManager: KyuubiApplicationManager)
+ engineManager: KyuubiApplicationManager,
+ startupProcessSemaphore: Option[Semaphore] = None)
extends Logging {
// The corresponding ServerSpace where the engine belongs to
private val serverSpace: String = conf.get(HA_NAMESPACE)
@@ -69,7 +71,8 @@ private[kyuubi] class EngineRef(
private val engineType: EngineType = EngineType.withName(conf.get(ENGINE_TYPE))
// Server-side engine pool size threshold
- private val poolThreshold: Int = conf.get(ENGINE_POOL_SIZE_THRESHOLD)
+ private val poolThreshold: Int = Option(KyuubiServer.kyuubiServer).map(_.getConf)
+ .getOrElse(KyuubiConf()).get(ENGINE_POOL_SIZE_THRESHOLD)
private val clientPoolSize: Int = conf.get(ENGINE_POOL_SIZE)
@@ -192,6 +195,7 @@ private[kyuubi] class EngineRef(
case TRINO =>
new TrinoProcessBuilder(appUser, conf, engineRefId, extraEngineLog)
case HIVE_SQL =>
+ conf.setIfMissing(HiveProcessBuilder.HIVE_ENGINE_NAME, defaultEngineName)
new HiveProcessBuilder(appUser, conf, engineRefId, extraEngineLog)
case JDBC =>
new JdbcProcessBuilder(appUser, conf, engineRefId, extraEngineLog)
@@ -200,16 +204,25 @@ private[kyuubi] class EngineRef(
}
MetricsSystem.tracing(_.incCount(ENGINE_TOTAL))
+ var acquiredPermit = false
try {
+ if (!startupProcessSemaphore.forall(_.tryAcquire(timeout, TimeUnit.MILLISECONDS))) {
+ MetricsSystem.tracing(_.incCount(MetricRegistry.name(ENGINE_TIMEOUT, appUser)))
+ throw KyuubiSQLException(
+ s"Timeout($timeout ms, you can modify ${ENGINE_INIT_TIMEOUT.key} to change it) to" +
+ s" acquires a permit from engine builder semaphore.")
+ }
+ acquiredPermit = true
val redactedCmd = builder.toString
info(s"Launching engine:\n$redactedCmd")
builder.validateConf
val process = builder.start
var exitValue: Option[Int] = None
+ var lastApplicationInfo: Option[ApplicationInfo] = None
while (engineRef.isEmpty) {
if (exitValue.isEmpty && process.waitFor(1, TimeUnit.SECONDS)) {
exitValue = Some(process.exitValue())
- if (exitValue.get != 0) {
+ if (exitValue != Some(0)) {
val error = builder.getError
MetricsSystem.tracing { ms =>
ms.incCount(MetricRegistry.name(ENGINE_FAIL, appUser))
@@ -219,14 +232,33 @@ private[kyuubi] class EngineRef(
}
}
+ if (started + timeout <= System.currentTimeMillis()) {
+ val killMessage =
+ engineManager.killApplication(builder.appMgrInfo(), engineRefId, Some(appUser))
+ builder.close(true)
+ MetricsSystem.tracing(_.incCount(MetricRegistry.name(ENGINE_TIMEOUT, appUser)))
+ throw KyuubiSQLException(
+ s"Timeout($timeout ms, you can modify ${ENGINE_INIT_TIMEOUT.key} to change it) to" +
+ s" launched $engineType engine with $redactedCmd. $killMessage",
+ builder.getError)
+ }
+ engineRef = discoveryClient.getEngineByRefId(engineSpace, engineRefId)
+
// even the submit process succeeds, the application might meet failure when initializing,
// check the engine application state from engine manager and fast fail on engine terminate
- if (exitValue == Some(0)) {
+ if (engineRef.isEmpty && exitValue == Some(0)) {
Option(engineManager).foreach { engineMgr =>
- engineMgr.getApplicationInfo(
- builder.clusterManager(),
+ if (lastApplicationInfo.isDefined) {
+ TimeUnit.SECONDS.sleep(1)
+ }
+
+ val applicationInfo = engineMgr.getApplicationInfo(
+ builder.appMgrInfo(),
engineRefId,
- Some(started)).foreach { appInfo =>
+ Some(appUser),
+ Some(started))
+
+ applicationInfo.foreach { appInfo =>
if (ApplicationState.isTerminated(appInfo.state)) {
MetricsSystem.tracing { ms =>
ms.incCount(MetricRegistry.name(ENGINE_FAIL, appUser))
@@ -240,25 +272,23 @@ private[kyuubi] class EngineRef(
builder.getError)
}
}
- }
- }
- if (started + timeout <= System.currentTimeMillis()) {
- val killMessage = engineManager.killApplication(builder.clusterManager(), engineRefId)
- process.destroyForcibly()
- MetricsSystem.tracing(_.incCount(MetricRegistry.name(ENGINE_TIMEOUT, appUser)))
- throw KyuubiSQLException(
- s"Timeout($timeout ms, you can modify ${ENGINE_INIT_TIMEOUT.key} to change it) to" +
- s" launched $engineType engine with $redactedCmd. $killMessage",
- builder.getError)
+ lastApplicationInfo = applicationInfo
+ }
}
- engineRef = discoveryClient.getEngineByRefId(engineSpace, engineRefId)
}
engineRef.get
} finally {
+ if (acquiredPermit) startupProcessSemaphore.foreach(_.release())
+ val waitCompletion = conf.get(KyuubiConf.SESSION_ENGINE_STARTUP_WAIT_COMPLETION)
+ val destroyProcess = !waitCompletion && builder.isClusterMode()
+ if (destroyProcess) {
+ info("Destroy the builder process because waitCompletion is false" +
+ " and the engine is running in cluster mode.")
+ }
// we must close the process builder whether session open is success or failure since
// we have a log capture thread in process builder.
- builder.close()
+ builder.close(destroyProcess)
}
}
@@ -280,9 +310,9 @@ private[kyuubi] class EngineRef(
def close(): Unit = {
if (shareLevel == CONNECTION && builder != null) {
try {
- val clusterManager = builder.clusterManager()
+ val appMgrInfo = builder.appMgrInfo()
builder.close(true)
- engineManager.killApplication(clusterManager, engineRefId)
+ engineManager.killApplication(appMgrInfo, engineRefId, Some(appUser))
} catch {
case e: Exception =>
warn(s"Error closing engine builder, engineRefId: $engineRefId", e)
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/JpsApplicationOperation.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/JpsApplicationOperation.scala
index ce2e054617a..1d0d58d167c 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/JpsApplicationOperation.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/JpsApplicationOperation.scala
@@ -41,8 +41,9 @@ class JpsApplicationOperation extends ApplicationOperation {
}
}
- override def isSupported(clusterManager: Option[String]): Boolean = {
- runner != null && (clusterManager.isEmpty || clusterManager.get == "local")
+ override def isSupported(appMgrInfo: ApplicationManagerInfo): Boolean = {
+ runner != null &&
+ (appMgrInfo.resourceManager.isEmpty || appMgrInfo.resourceManager.get == "local")
}
private def getEngine(tag: String): Option[String] = {
@@ -80,11 +81,18 @@ class JpsApplicationOperation extends ApplicationOperation {
}
}
- override def killApplicationByTag(tag: String): KillResponse = {
+ override def killApplicationByTag(
+ appMgrInfo: ApplicationManagerInfo,
+ tag: String,
+ proxyUser: Option[String] = None): KillResponse = {
killJpsApplicationByTag(tag, true)
}
- override def getApplicationInfoByTag(tag: String, submitTime: Option[Long]): ApplicationInfo = {
+ override def getApplicationInfoByTag(
+ appMgrInfo: ApplicationManagerInfo,
+ tag: String,
+ proxyUser: Option[String] = None,
+ submitTime: Option[Long] = None): ApplicationInfo = {
val commandOption = getEngine(tag)
if (commandOption.nonEmpty) {
val idAndCmd = commandOption.get
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationAuditLogger.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationAuditLogger.scala
new file mode 100644
index 00000000000..731b9d7b5ba
--- /dev/null
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationAuditLogger.scala
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.engine
+
+import io.fabric8.kubernetes.api.model.Pod
+
+import org.apache.kyuubi.Logging
+import org.apache.kyuubi.engine.KubernetesApplicationOperation.{toApplicationState, LABEL_KYUUBI_UNIQUE_KEY, SPARK_APP_ID_LABEL}
+
+object KubernetesApplicationAuditLogger extends Logging {
+ final private val AUDIT_BUFFER = new ThreadLocal[StringBuilder]() {
+ override protected def initialValue: StringBuilder = new StringBuilder()
+ }
+
+ def audit(kubernetesInfo: KubernetesInfo, pod: Pod): Unit = {
+ val sb = AUDIT_BUFFER.get()
+ sb.setLength(0)
+ sb.append(s"label=${pod.getMetadata.getLabels.get(LABEL_KYUUBI_UNIQUE_KEY)}").append("\t")
+ sb.append(s"context=${kubernetesInfo.context.orNull}").append("\t")
+ sb.append(s"namespace=${kubernetesInfo.namespace.orNull}").append("\t")
+ sb.append(s"pod=${pod.getMetadata.getName}").append("\t")
+ sb.append(s"appId=${pod.getMetadata.getLabels.get(SPARK_APP_ID_LABEL)}").append("\t")
+ sb.append(s"appState=${toApplicationState(pod.getStatus.getPhase)}")
+ info(sb.toString())
+ }
+}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
index 83792f52f79..16a0c29d149 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
@@ -17,25 +17,36 @@
package org.apache.kyuubi.engine
+import java.util.Locale
import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import scala.collection.JavaConverters._
+
import com.google.common.cache.{Cache, CacheBuilder, RemovalNotification}
import io.fabric8.kubernetes.api.model.Pod
import io.fabric8.kubernetes.client.KubernetesClient
import io.fabric8.kubernetes.client.informers.{ResourceEventHandler, SharedIndexInformer}
-import org.apache.kyuubi.{Logging, Utils}
+import org.apache.kyuubi.{KyuubiException, Logging, Utils}
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.engine.ApplicationState.{isTerminated, ApplicationState, FAILED, FINISHED, NOT_FOUND, PENDING, RUNNING, UNKNOWN}
-import org.apache.kyuubi.engine.KubernetesApplicationOperation.{toApplicationState, LABEL_KYUUBI_UNIQUE_KEY, SPARK_APP_ID_LABEL}
+import org.apache.kyuubi.engine.KubernetesApplicationOperation.{toApplicationState, toLabel, LABEL_KYUUBI_UNIQUE_KEY, SPARK_APP_ID_LABEL}
import org.apache.kyuubi.util.KubernetesUtils
class KubernetesApplicationOperation extends ApplicationOperation with Logging {
- @volatile
- private var kubernetesClient: KubernetesClient = _
- private var enginePodInformer: SharedIndexInformer[Pod] = _
+ private val kubernetesClients: ConcurrentHashMap[KubernetesInfo, KubernetesClient] =
+ new ConcurrentHashMap[KubernetesInfo, KubernetesClient]
+ private val enginePodInformers: ConcurrentHashMap[KubernetesInfo, SharedIndexInformer[Pod]] =
+ new ConcurrentHashMap[KubernetesInfo, SharedIndexInformer[Pod]]
+
private var submitTimeout: Long = _
+ private var kyuubiConf: KyuubiConf = _
+
+ private def allowedContexts: Set[String] =
+ kyuubiConf.get(KyuubiConf.KUBERNETES_CONTEXT_ALLOW_LIST)
+ private def allowedNamespaces: Set[String] =
+ kyuubiConf.get(KyuubiConf.KUBERNETES_NAMESPACE_ALLOW_LIST)
// key is kyuubi_unique_key
private val appInfoStore: ConcurrentHashMap[String, ApplicationInfo] =
@@ -43,112 +54,156 @@ class KubernetesApplicationOperation extends ApplicationOperation with Logging {
// key is kyuubi_unique_key
private var cleanupTerminatedAppInfoTrigger: Cache[String, ApplicationState] = _
- override def initialize(conf: KyuubiConf): Unit = {
- info("Start initializing Kubernetes Client.")
- kubernetesClient = KubernetesUtils.buildKubernetesClient(conf) match {
+ private def getOrCreateKubernetesClient(kubernetesInfo: KubernetesInfo): KubernetesClient = {
+ checkKubernetesInfo(kubernetesInfo)
+ kubernetesClients.computeIfAbsent(kubernetesInfo, kInfo => buildKubernetesClient(kInfo))
+ }
+
+ // Visible for testing
+ private[engine] def checkKubernetesInfo(kubernetesInfo: KubernetesInfo): Unit = {
+ val context = kubernetesInfo.context
+ val namespace = kubernetesInfo.namespace
+
+ if (allowedContexts.nonEmpty && context.exists(!allowedContexts.contains(_))) {
+ throw new KyuubiException(
+ s"Kubernetes context $context is not in the allowed list[$allowedContexts]")
+ }
+
+ if (allowedNamespaces.nonEmpty && namespace.exists(!allowedNamespaces.contains(_))) {
+ throw new KyuubiException(
+ s"Kubernetes namespace $namespace is not in the allowed list[$allowedNamespaces]")
+ }
+ }
+
+ private def buildKubernetesClient(kubernetesInfo: KubernetesInfo): KubernetesClient = {
+ val kubernetesConf =
+ kyuubiConf.getKubernetesConf(kubernetesInfo.context, kubernetesInfo.namespace)
+ KubernetesUtils.buildKubernetesClient(kubernetesConf) match {
case Some(client) =>
- info(s"Initialized Kubernetes Client connect to: ${client.getMasterUrl}")
- submitTimeout = conf.get(KyuubiConf.ENGINE_SUBMIT_TIMEOUT)
- // Disable resync, see https://github.com/fabric8io/kubernetes-client/discussions/5015
- enginePodInformer = client.pods()
+ info(s"[$kubernetesInfo] Initialized Kubernetes Client connect to: ${client.getMasterUrl}")
+ val enginePodInformer = client.pods()
.withLabel(LABEL_KYUUBI_UNIQUE_KEY)
- .inform(new SparkEnginePodEventHandler)
- info("Start Kubernetes Client Informer.")
- // Defer cleaning terminated application information
- val retainPeriod = conf.get(KyuubiConf.KUBERNETES_TERMINATED_APPLICATION_RETAIN_PERIOD)
- cleanupTerminatedAppInfoTrigger = CacheBuilder.newBuilder()
- .expireAfterWrite(retainPeriod, TimeUnit.MILLISECONDS)
- .removalListener((notification: RemovalNotification[String, ApplicationState]) => {
- Option(appInfoStore.remove(notification.getKey)).foreach { removed =>
- info(s"Remove terminated application ${removed.id} with " +
- s"tag ${notification.getKey} and state ${removed.state}")
- }
- })
- .build()
+ .inform(new SparkEnginePodEventHandler(kubernetesInfo))
+ info(s"[$kubernetesInfo] Start Kubernetes Client Informer.")
+ enginePodInformers.put(kubernetesInfo, enginePodInformer)
client
- case None =>
- warn("Fail to init Kubernetes Client for Kubernetes Application Operation")
- null
+
+ case None => throw new KyuubiException(s"Fail to build Kubernetes client for $kubernetesInfo")
}
}
- override def isSupported(clusterManager: Option[String]): Boolean = {
+ override def initialize(conf: KyuubiConf): Unit = {
+ kyuubiConf = conf
+ info("Start initializing Kubernetes application operation.")
+ submitTimeout = conf.get(KyuubiConf.ENGINE_KUBERNETES_SUBMIT_TIMEOUT)
+ // Defer cleaning terminated application information
+ val retainPeriod = conf.get(KyuubiConf.KUBERNETES_TERMINATED_APPLICATION_RETAIN_PERIOD)
+ cleanupTerminatedAppInfoTrigger = CacheBuilder.newBuilder()
+ .expireAfterWrite(retainPeriod, TimeUnit.MILLISECONDS)
+ .removalListener((notification: RemovalNotification[String, ApplicationState]) => {
+ Option(appInfoStore.remove(notification.getKey)).foreach { removed =>
+ info(s"Remove terminated application ${removed.id} with " +
+ s"[${toLabel(notification.getKey)}, state: ${removed.state}]")
+ }
+ })
+ .build()
+ }
+
+ override def isSupported(appMgrInfo: ApplicationManagerInfo): Boolean = {
// TODO add deploy mode to check whether is supported
- kubernetesClient != null && clusterManager.nonEmpty &&
- clusterManager.get.toLowerCase.startsWith("k8s")
+ kyuubiConf != null &&
+ appMgrInfo.resourceManager.exists(_.toLowerCase(Locale.ROOT).startsWith("k8s"))
}
- override def killApplicationByTag(tag: String): KillResponse = {
- if (kubernetesClient == null) {
+ override def killApplicationByTag(
+ appMgrInfo: ApplicationManagerInfo,
+ tag: String,
+ proxyUser: Option[String] = None): KillResponse = {
+ if (kyuubiConf == null) {
throw new IllegalStateException("Methods initialize and isSupported must be called ahead")
}
- debug(s"Deleting application info from Kubernetes cluster by $tag tag")
+ val kubernetesInfo = appMgrInfo.kubernetesInfo
+ val kubernetesClient = getOrCreateKubernetesClient(kubernetesInfo)
+ debug(s"[$kubernetesInfo] Deleting application[${toLabel(tag)}]'s info from Kubernetes cluster")
try {
- val info = appInfoStore.getOrDefault(tag, ApplicationInfo.NOT_FOUND)
- debug(s"Application info[tag: $tag] is in ${info.state}")
- info.state match {
- case NOT_FOUND | FAILED | UNKNOWN =>
- (
- false,
- s"Target application[tag: $tag] is in ${info.state} status")
- case _ =>
+ Option(appInfoStore.get(tag)) match {
+ case Some(info) =>
+ debug(s"Application[${toLabel(tag)}] is in ${info.state} state")
+ info.state match {
+ case NOT_FOUND | FAILED | UNKNOWN =>
+ (
+ false,
+ s"[$kubernetesInfo] Target application[${toLabel(tag)}] is in ${info.state} state")
+ case _ =>
+ (
+ !kubernetesClient.pods.withName(info.name).delete().isEmpty,
+ s"[$kubernetesInfo] Operation of deleted" +
+ s" application[appId: ${info.id}, ${toLabel(tag)}] is completed")
+ }
+ case None =>
+ warn(s"No application info found, trying to delete pod with ${toLabel(tag)}")
(
- !kubernetesClient.pods.withName(info.name).delete().isEmpty,
- s"Operation of deleted application[appId: ${info.id} ,tag: $tag] is completed")
+ !kubernetesClient.pods.withLabel(LABEL_KYUUBI_UNIQUE_KEY, tag).delete().isEmpty,
+ s"[$kubernetesInfo] Operation of deleted pod with ${toLabel(tag)} is completed")
}
} catch {
case e: Exception =>
- (false, s"Failed to terminate application with $tag, due to ${e.getMessage}")
+ (
+ false,
+ s"[$kubernetesInfo] Failed to terminate application[${toLabel(tag)}], " +
+ s"due to ${e.getMessage}")
}
}
- override def getApplicationInfoByTag(tag: String, submitTime: Option[Long]): ApplicationInfo = {
- if (kubernetesClient == null) {
+ override def getApplicationInfoByTag(
+ appMgrInfo: ApplicationManagerInfo,
+ tag: String,
+ proxyUser: Option[String] = None,
+ submitTime: Option[Long] = None): ApplicationInfo = {
+ if (kyuubiConf == null) {
throw new IllegalStateException("Methods initialize and isSupported must be called ahead")
}
- debug(s"Getting application info from Kubernetes cluster by $tag tag")
+ debug(s"Getting application[${toLabel(tag)}]'s info from Kubernetes cluster")
try {
+ // need to initialize the kubernetes client if not exists
+ getOrCreateKubernetesClient(appMgrInfo.kubernetesInfo)
val appInfo = appInfoStore.getOrDefault(tag, ApplicationInfo.NOT_FOUND)
(appInfo.state, submitTime) match {
// Kyuubi should wait second if pod is not be created
case (NOT_FOUND, Some(_submitTime)) =>
val elapsedTime = System.currentTimeMillis - _submitTime
if (elapsedTime > submitTimeout) {
- error(s"Can't find target driver pod by tag: $tag, " +
+ error(s"Can't find target driver pod by ${toLabel(tag)}, " +
s"elapsed time: ${elapsedTime}ms exceeds ${submitTimeout}ms.")
ApplicationInfo.NOT_FOUND
} else {
- warn("Wait for driver pod to be created, " +
+ warn(s"Waiting for driver pod with ${toLabel(tag)} to be created, " +
s"elapsed time: ${elapsedTime}ms, return UNKNOWN status")
ApplicationInfo.UNKNOWN
}
case (NOT_FOUND, None) =>
ApplicationInfo.NOT_FOUND
case _ =>
- debug(s"Successfully got application info by $tag: $appInfo")
+ debug(s"Successfully got application[${toLabel(tag)}]'s info: $appInfo")
appInfo
}
} catch {
case e: Exception =>
- error(s"Failed to get application with $tag, due to ${e.getMessage}")
+ error(s"Failed to get application by ${toLabel(tag)}, due to ${e.getMessage}")
ApplicationInfo.NOT_FOUND
}
}
override def stop(): Unit = {
- Utils.tryLogNonFatalError {
- if (enginePodInformer != null) {
- enginePodInformer.stop()
- enginePodInformer = null
- }
+ enginePodInformers.asScala.foreach { case (_, informer) =>
+ Utils.tryLogNonFatalError(informer.stop())
}
+ enginePodInformers.clear()
- Utils.tryLogNonFatalError {
- if (kubernetesClient != null) {
- kubernetesClient.close()
- kubernetesClient = null
- }
+ kubernetesClients.asScala.foreach { case (_, client) =>
+ Utils.tryLogNonFatalError(client.close())
}
+ kubernetesClients.clear()
if (cleanupTerminatedAppInfoTrigger != null) {
cleanupTerminatedAppInfoTrigger.cleanUp()
@@ -156,11 +211,13 @@ class KubernetesApplicationOperation extends ApplicationOperation with Logging {
}
}
- private class SparkEnginePodEventHandler extends ResourceEventHandler[Pod] {
+ private class SparkEnginePodEventHandler(kubernetesInfo: KubernetesInfo)
+ extends ResourceEventHandler[Pod] {
override def onAdd(pod: Pod): Unit = {
if (isSparkEnginePod(pod)) {
updateApplicationState(pod)
+ KubernetesApplicationAuditLogger.audit(kubernetesInfo, pod)
}
}
@@ -171,6 +228,7 @@ class KubernetesApplicationOperation extends ApplicationOperation with Logging {
if (isTerminated(appState)) {
markApplicationTerminated(newPod)
}
+ KubernetesApplicationAuditLogger.audit(kubernetesInfo, newPod)
}
}
@@ -178,6 +236,7 @@ class KubernetesApplicationOperation extends ApplicationOperation with Logging {
if (isSparkEnginePod(pod)) {
updateApplicationState(pod)
markApplicationTerminated(pod)
+ KubernetesApplicationAuditLogger.audit(kubernetesInfo, pod)
}
}
}
@@ -199,10 +258,11 @@ class KubernetesApplicationOperation extends ApplicationOperation with Logging {
error = Option(pod.getStatus.getReason)))
}
- private def markApplicationTerminated(pod: Pod): Unit = {
- cleanupTerminatedAppInfoTrigger.put(
- pod.getMetadata.getLabels.get(LABEL_KYUUBI_UNIQUE_KEY),
- toApplicationState(pod.getStatus.getPhase))
+ private def markApplicationTerminated(pod: Pod): Unit = synchronized {
+ val key = pod.getMetadata.getLabels.get(LABEL_KYUUBI_UNIQUE_KEY)
+ if (cleanupTerminatedAppInfoTrigger.getIfPresent(key) == null) {
+ cleanupTerminatedAppInfoTrigger.put(key, toApplicationState(pod.getStatus.getPhase))
+ }
}
}
@@ -212,6 +272,8 @@ object KubernetesApplicationOperation extends Logging {
val KUBERNETES_SERVICE_HOST = "KUBERNETES_SERVICE_HOST"
val KUBERNETES_SERVICE_PORT = "KUBERNETES_SERVICE_PORT"
+ def toLabel(tag: String): String = s"label: $LABEL_KYUUBI_UNIQUE_KEY=$tag"
+
def toApplicationState(state: String): ApplicationState = state match {
// https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/types.go#L2396
// https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KyuubiApplicationManager.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KyuubiApplicationManager.scala
index 9b23e550d07..f8b64005359 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KyuubiApplicationManager.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KyuubiApplicationManager.scala
@@ -20,9 +20,8 @@ package org.apache.kyuubi.engine
import java.io.File
import java.net.{URI, URISyntaxException}
import java.nio.file.{Files, Path}
-import java.util.{Locale, ServiceLoader}
+import java.util.Locale
-import scala.collection.JavaConverters._
import scala.util.control.NonFatal
import org.apache.kyuubi.{KyuubiException, Utils}
@@ -31,14 +30,13 @@ import org.apache.kyuubi.engine.KubernetesApplicationOperation.LABEL_KYUUBI_UNIQ
import org.apache.kyuubi.engine.flink.FlinkProcessBuilder
import org.apache.kyuubi.engine.spark.SparkProcessBuilder
import org.apache.kyuubi.service.AbstractService
+import org.apache.kyuubi.util.reflect.ReflectUtils._
class KyuubiApplicationManager extends AbstractService("KyuubiApplicationManager") {
// TODO: maybe add a configuration is better
- private val operations = {
- ServiceLoader.load(classOf[ApplicationOperation], Utils.getContextOrKyuubiClassLoader)
- .iterator().asScala.toSeq
- }
+ private val operations =
+ loadFromServiceLoader[ApplicationOperation](Utils.getContextOrKyuubiClassLoader).toSeq
override def initialize(conf: KyuubiConf): Unit = {
operations.foreach { op =>
@@ -62,11 +60,14 @@ class KyuubiApplicationManager extends AbstractService("KyuubiApplicationManager
super.stop()
}
- def killApplication(resourceManager: Option[String], tag: String): KillResponse = {
+ def killApplication(
+ appMgrInfo: ApplicationManagerInfo,
+ tag: String,
+ proxyUser: Option[String] = None): KillResponse = {
var (killed, lastMessage): KillResponse = (false, null)
for (operation <- operations if !killed) {
- if (operation.isSupported(resourceManager)) {
- val (k, m) = operation.killApplicationByTag(tag)
+ if (operation.isSupported(appMgrInfo)) {
+ val (k, m) = operation.killApplicationByTag(appMgrInfo, tag, proxyUser)
killed = k
lastMessage = m
}
@@ -75,7 +76,7 @@ class KyuubiApplicationManager extends AbstractService("KyuubiApplicationManager
val finalMessage =
if (lastMessage == null) {
s"No ${classOf[ApplicationOperation]} Service found in ServiceLoader" +
- s" for $resourceManager"
+ s" for $appMgrInfo"
} else {
lastMessage
}
@@ -83,12 +84,13 @@ class KyuubiApplicationManager extends AbstractService("KyuubiApplicationManager
}
def getApplicationInfo(
- clusterManager: Option[String],
+ appMgrInfo: ApplicationManagerInfo,
tag: String,
+ proxyUser: Option[String] = None,
submitTime: Option[Long] = None): Option[ApplicationInfo] = {
- val operation = operations.find(_.isSupported(clusterManager))
+ val operation = operations.find(_.isSupported(appMgrInfo))
operation match {
- case Some(op) => Some(op.getApplicationInfoByTag(tag, submitTime))
+ case Some(op) => Some(op.getApplicationInfoByTag(appMgrInfo, tag, proxyUser, submitTime))
case None => None
}
}
@@ -105,10 +107,10 @@ object KyuubiApplicationManager {
conf.set("spark.kubernetes.driver.label." + LABEL_KYUUBI_UNIQUE_KEY, tag)
}
- private def setupFlinkK8sTag(tag: String, conf: KyuubiConf): Unit = {
- val originalTag = conf.getOption(FlinkProcessBuilder.TAG_KEY).map(_ + ",").getOrElse("")
+ private def setupFlinkYarnTag(tag: String, conf: KyuubiConf): Unit = {
+ val originalTag = conf.getOption(FlinkProcessBuilder.YARN_TAG_KEY).map(_ + ",").getOrElse("")
val newTag = s"${originalTag}KYUUBI" + Some(tag).filterNot(_.isEmpty).map("," + _).getOrElse("")
- conf.set(FlinkProcessBuilder.TAG_KEY, newTag)
+ conf.set(FlinkProcessBuilder.YARN_TAG_KEY, newTag)
}
val uploadWorkDir: Path = {
@@ -176,9 +178,9 @@ object KyuubiApplicationManager {
// if the master is not identified ahead, add all tags
setupSparkYarnTag(applicationTag, conf)
setupSparkK8sTag(applicationTag, conf)
- case ("FLINK", _) =>
+ case ("FLINK", Some("YARN")) =>
// running flink on other platforms is not yet supported
- setupFlinkK8sTag(applicationTag, conf)
+ setupFlinkYarnTag(applicationTag, conf)
// other engine types are running locally yet
case _ =>
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
index 4c7330b4dd5..44b317c71ea 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
@@ -155,7 +155,10 @@ trait ProcBuilder {
@volatile private var error: Throwable = UNCAUGHT_ERROR
private val engineLogMaxLines = conf.get(KyuubiConf.SESSION_ENGINE_STARTUP_MAX_LOG_LINES)
- private val waitCompletion = conf.get(KyuubiConf.SESSION_ENGINE_STARTUP_WAIT_COMPLETION)
+
+ private val engineStartupDestroyTimeout =
+ conf.get(KyuubiConf.SESSION_ENGINE_STARTUP_DESTROY_TIMEOUT)
+
protected val lastRowsOfLog: EvictingQueue[String] = EvictingQueue.create(engineLogMaxLines)
// Visible for test
@volatile private[kyuubi] var logCaptureThreadReleased: Boolean = true
@@ -249,14 +252,15 @@ trait ProcBuilder {
process
}
- def close(destroyProcess: Boolean = !waitCompletion): Unit = synchronized {
+ def isClusterMode(): Boolean = false
+
+ def close(destroyProcess: Boolean): Unit = synchronized {
if (logCaptureThread != null) {
logCaptureThread.interrupt()
logCaptureThread = null
}
if (destroyProcess && process != null) {
- info("Destroy the process, since waitCompletion is false.")
- process.destroyForcibly()
+ Utils.terminateProcess(process, engineStartupDestroyTimeout)
process = null
}
}
@@ -336,15 +340,18 @@ trait ProcBuilder {
protected def validateEnv(requiredEnv: String): Throwable = {
KyuubiSQLException(s"$requiredEnv is not set! For more information on installing and " +
s"configuring $requiredEnv, please visit https://kyuubi.readthedocs.io/en/master/" +
- s"deployment/settings.html#environments")
+ s"configuration/settings.html#environments")
}
def clusterManager(): Option[String] = None
+ def appMgrInfo(): ApplicationManagerInfo = ApplicationManagerInfo(None)
}
object ProcBuilder extends Logging {
private val PROC_BUILD_LOGGER = new NamedThreadFactory("process-logger-capture", daemon = true)
private val UNCAUGHT_ERROR = new RuntimeException("Uncaught error")
+
+ private[engine] val KYUUBI_ENGINE_LOG_PATH_KEY = "kyuubi.engine.engineLog.path"
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/YarnApplicationOperation.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/YarnApplicationOperation.scala
index e836e65da99..1f672ad701e 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/YarnApplicationOperation.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/YarnApplicationOperation.scala
@@ -17,13 +17,18 @@
package org.apache.kyuubi.engine
+import java.util.Locale
+
import scala.collection.JavaConverters._
+import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.yarn.api.records.{FinalApplicationStatus, YarnApplicationState}
import org.apache.hadoop.yarn.client.api.YarnClient
-import org.apache.kyuubi.Logging
+import org.apache.kyuubi.{Logging, Utils}
import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.KyuubiConf.YarnUserStrategy
+import org.apache.kyuubi.config.KyuubiConf.YarnUserStrategy._
import org.apache.kyuubi.engine.ApplicationOperation._
import org.apache.kyuubi.engine.ApplicationState.ApplicationState
import org.apache.kyuubi.engine.YarnApplicationOperation.toApplicationState
@@ -31,85 +36,136 @@ import org.apache.kyuubi.util.KyuubiHadoopUtils
class YarnApplicationOperation extends ApplicationOperation with Logging {
- @volatile private var yarnClient: YarnClient = _
+ private var yarnConf: Configuration = _
+ @volatile private var adminYarnClient: Option[YarnClient] = None
+ private var submitTimeout: Long = _
override def initialize(conf: KyuubiConf): Unit = {
- val yarnConf = KyuubiHadoopUtils.newYarnConfiguration(conf)
- // YarnClient is thread-safe
- val c = YarnClient.createYarnClient()
- c.init(yarnConf)
- c.start()
- yarnClient = c
- info(s"Successfully initialized yarn client: ${c.getServiceState}")
+ submitTimeout = conf.get(KyuubiConf.ENGINE_YARN_SUBMIT_TIMEOUT)
+ yarnConf = KyuubiHadoopUtils.newYarnConfiguration(conf)
+
+ def createYarnClientWithCurrentUser(): Unit = {
+ val c = createYarnClient(yarnConf)
+ info(s"Creating admin YARN client with current user: ${Utils.currentUser}.")
+ adminYarnClient = Some(c)
+ }
+
+ def createYarnClientWithProxyUser(proxyUser: String): Unit = Utils.doAs(proxyUser) { () =>
+ val c = createYarnClient(yarnConf)
+ info(s"Creating admin YARN client with proxy user: $proxyUser.")
+ adminYarnClient = Some(c)
+ }
+
+ YarnUserStrategy.withName(conf.get(KyuubiConf.YARN_USER_STRATEGY)) match {
+ case NONE =>
+ createYarnClientWithCurrentUser()
+ case ADMIN if conf.get(KyuubiConf.YARN_USER_ADMIN) == Utils.currentUser =>
+ createYarnClientWithCurrentUser()
+ case ADMIN =>
+ createYarnClientWithProxyUser(conf.get(KyuubiConf.YARN_USER_ADMIN))
+ case OWNER =>
+ info("Skip initializing admin YARN client")
+ }
}
- override def isSupported(clusterManager: Option[String]): Boolean = {
- yarnClient != null && clusterManager.nonEmpty && "yarn".equalsIgnoreCase(clusterManager.get)
+ private def createYarnClient(_yarnConf: Configuration): YarnClient = {
+ // YarnClient is thread-safe
+ val yarnClient = YarnClient.createYarnClient()
+ yarnClient.init(_yarnConf)
+ yarnClient.start()
+ yarnClient
}
- override def killApplicationByTag(tag: String): KillResponse = {
- if (yarnClient != null) {
- try {
- val reports = yarnClient.getApplications(null, null, Set(tag).asJava)
- if (reports.isEmpty) {
- (false, NOT_FOUND)
- } else {
+ private def withYarnClient[T](proxyUser: Option[String])(action: YarnClient => T): T = {
+ (adminYarnClient, proxyUser) match {
+ case (Some(yarnClient), _) =>
+ action(yarnClient)
+ case (None, Some(user)) =>
+ Utils.doAs(user) { () =>
+ var yarnClient: YarnClient = null
try {
- val applicationId = reports.get(0).getApplicationId
- yarnClient.killApplication(applicationId)
- (true, s"Succeeded to terminate: $applicationId with $tag")
- } catch {
- case e: Exception =>
- (false, s"Failed to terminate application with $tag, due to ${e.getMessage}")
+ yarnClient = createYarnClient(yarnConf)
+ action(yarnClient)
+ } finally {
+ Utils.tryLogNonFatalError(yarnClient.close())
}
}
- } catch {
- case e: Exception =>
- (
- false,
- s"Failed to get while terminating application with tag $tag," +
- s" due to ${e.getMessage}")
- }
- } else {
- throw new IllegalStateException("Methods initialize and isSupported must be called ahead")
+ case (None, None) =>
+ throw new IllegalStateException("Methods initialize and isSupported must be called ahead")
}
}
- override def getApplicationInfoByTag(tag: String, submitTime: Option[Long]): ApplicationInfo = {
- if (yarnClient != null) {
- debug(s"Getting application info from Yarn cluster by $tag tag")
+ override def isSupported(appMgrInfo: ApplicationManagerInfo): Boolean =
+ appMgrInfo.resourceManager.exists(_.toLowerCase(Locale.ROOT).startsWith("yarn"))
+
+ override def killApplicationByTag(
+ appMgrInfo: ApplicationManagerInfo,
+ tag: String,
+ proxyUser: Option[String] = None): KillResponse = withYarnClient(proxyUser) { yarnClient =>
+ try {
val reports = yarnClient.getApplications(null, null, Set(tag).asJava)
if (reports.isEmpty) {
- debug(s"Application with tag $tag not found")
- ApplicationInfo(id = null, name = null, state = ApplicationState.NOT_FOUND)
+ (false, NOT_FOUND)
} else {
- val report = reports.get(0)
- val info = ApplicationInfo(
- id = report.getApplicationId.toString,
- name = report.getName,
- state = toApplicationState(
- report.getApplicationId.toString,
- report.getYarnApplicationState,
- report.getFinalApplicationStatus),
- url = Option(report.getTrackingUrl),
- error = Option(report.getDiagnostics))
- debug(s"Successfully got application info by $tag: $info")
- info
+ try {
+ val applicationId = reports.get(0).getApplicationId
+ yarnClient.killApplication(applicationId)
+ (true, s"Succeeded to terminate: $applicationId with $tag")
+ } catch {
+ case e: Exception =>
+ (false, s"Failed to terminate application with $tag, due to ${e.getMessage}")
+ }
}
- } else {
- throw new IllegalStateException("Methods initialize and isSupported must be called ahead")
+ } catch {
+ case e: Exception =>
+ (
+ false,
+ s"Failed to get while terminating application with tag $tag, due to ${e.getMessage}")
}
}
- override def stop(): Unit = {
- if (yarnClient != null) {
- try {
- yarnClient.stop()
- } catch {
- case e: Exception => error(e.getMessage)
+ override def getApplicationInfoByTag(
+ appMgrInfo: ApplicationManagerInfo,
+ tag: String,
+ proxyUser: Option[String] = None,
+ submitTime: Option[Long] = None): ApplicationInfo = withYarnClient(proxyUser) { yarnClient =>
+ debug(s"Getting application info from Yarn cluster by $tag tag")
+ val reports = yarnClient.getApplications(null, null, Set(tag).asJava)
+ if (reports.isEmpty) {
+ debug(s"Application with tag $tag not found")
+ submitTime match {
+ case Some(_submitTime) =>
+ val elapsedTime = System.currentTimeMillis - _submitTime
+ if (elapsedTime > submitTimeout) {
+ error(s"Can't find target yarn application by tag: $tag, " +
+ s"elapsed time: ${elapsedTime}ms exceeds ${submitTimeout}ms.")
+ ApplicationInfo.NOT_FOUND
+ } else {
+ warn("Wait for yarn application to be submitted, " +
+ s"elapsed time: ${elapsedTime}ms, return UNKNOWN status")
+ ApplicationInfo.UNKNOWN
+ }
+ case _ => ApplicationInfo.NOT_FOUND
}
+ } else {
+ val report = reports.get(0)
+ val info = ApplicationInfo(
+ id = report.getApplicationId.toString,
+ name = report.getName,
+ state = toApplicationState(
+ report.getApplicationId.toString,
+ report.getYarnApplicationState,
+ report.getFinalApplicationStatus),
+ url = Option(report.getTrackingUrl),
+ error = Option(report.getDiagnostics))
+ debug(s"Successfully got application info by $tag: $info")
+ info
}
}
+
+ override def stop(): Unit = adminYarnClient.foreach { yarnClient =>
+ Utils.tryLogNonFatalError(yarnClient.stop())
+ }
}
object YarnApplicationOperation extends Logging {
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/flink/FlinkProcessBuilder.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/flink/FlinkProcessBuilder.scala
index b8146c4d2b6..f43adfbc216 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/flink/FlinkProcessBuilder.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/flink/FlinkProcessBuilder.scala
@@ -21,15 +21,15 @@ import java.io.{File, FilenameFilter}
import java.nio.file.{Files, Paths}
import scala.collection.JavaConverters._
-import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.{ArrayBuffer, ListBuffer}
import com.google.common.annotations.VisibleForTesting
import org.apache.kyuubi._
-import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.{KyuubiConf, KyuubiReservedKeys}
import org.apache.kyuubi.config.KyuubiConf._
import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY
-import org.apache.kyuubi.engine.{KyuubiApplicationManager, ProcBuilder}
+import org.apache.kyuubi.engine.{ApplicationManagerInfo, KyuubiApplicationManager, ProcBuilder}
import org.apache.kyuubi.engine.flink.FlinkProcessBuilder._
import org.apache.kyuubi.operation.log.OperationLog
@@ -50,88 +50,178 @@ class FlinkProcessBuilder(
val flinkHome: String = getEngineHome(shortName)
+ val flinkExecutable: String = {
+ Paths.get(flinkHome, "bin", FLINK_EXEC_FILE).toFile.getCanonicalPath
+ }
+
+ // flink.execution.target are required in Kyuubi conf currently
+ val executionTarget: Option[String] = conf.getOption("flink.execution.target")
+
override protected def module: String = "kyuubi-flink-sql-engine"
override protected def mainClass: String = "org.apache.kyuubi.engine.flink.FlinkSQLEngine"
override def env: Map[String, String] = conf.getEnvs +
- (FLINK_PROXY_USER_KEY -> proxyUser)
+ ("FLINK_CONF_DIR" -> conf.getEnvs.getOrElse(
+ "FLINK_CONF_DIR",
+ s"$flinkHome${File.separator}conf"))
+
+ override def clusterManager(): Option[String] = {
+ executionTarget match {
+ case Some("yarn-application") => Some("yarn")
+ case _ => None
+ }
+ }
+
+ override def appMgrInfo(): ApplicationManagerInfo = {
+ ApplicationManagerInfo(clusterManager())
+ }
override protected val commands: Array[String] = {
KyuubiApplicationManager.tagApplication(engineRefId, shortName, clusterManager(), conf)
- val buffer = new ArrayBuffer[String]()
- buffer += executable
-
- val memory = conf.get(ENGINE_FLINK_MEMORY)
- buffer += s"-Xmx$memory"
- val javaOptions = conf.get(ENGINE_FLINK_JAVA_OPTIONS)
- if (javaOptions.isDefined) {
- buffer += javaOptions.get
- }
+ // unset engine credentials because Flink doesn't support them at the moment
+ conf.unset(KyuubiReservedKeys.KYUUBI_ENGINE_CREDENTIALS_KEY)
+ // flink.execution.target are required in Kyuubi conf currently
+ executionTarget match {
+ case Some("yarn-application") =>
+ val buffer = new ArrayBuffer[String]()
+ buffer += flinkExecutable
+ buffer += "run-application"
+
+ val flinkExtraJars = new ListBuffer[String]
+ // locate flink sql jars
+ val flinkSqlJars = Paths.get(flinkHome)
+ .resolve("opt")
+ .toFile
+ .listFiles(new FilenameFilter {
+ override def accept(dir: File, name: String): Boolean = {
+ name.toLowerCase.startsWith("flink-sql-client") ||
+ name.toLowerCase.startsWith("flink-sql-gateway")
+ }
+ }).map(f => f.getAbsolutePath).sorted
+ flinkExtraJars ++= flinkSqlJars
+
+ val userJars = conf.get(ENGINE_FLINK_APPLICATION_JARS)
+ userJars.foreach(jars => flinkExtraJars ++= jars.split(","))
+
+ val hiveConfDirOpt = env.get("HIVE_CONF_DIR")
+ hiveConfDirOpt.foreach { hiveConfDir =>
+ val hiveConfFile = Paths.get(hiveConfDir).resolve("hive-site.xml")
+ if (!Files.exists(hiveConfFile)) {
+ throw new KyuubiException(s"The file $hiveConfFile does not exists. " +
+ s"Please put hive-site.xml when HIVE_CONF_DIR env $hiveConfDir is configured.")
+ }
+ flinkExtraJars += s"$hiveConfFile"
+ }
- buffer += "-cp"
- val classpathEntries = new java.util.LinkedHashSet[String]
- // flink engine runtime jar
- mainResource.foreach(classpathEntries.add)
- // flink sql client jar
- val flinkSqlClientPath = Paths.get(flinkHome)
- .resolve("opt")
- .toFile
- .listFiles(new FilenameFilter {
- override def accept(dir: File, name: String): Boolean = {
- name.toLowerCase.startsWith("flink-sql-client")
+ buffer += "-t"
+ buffer += "yarn-application"
+ buffer += s"-Dyarn.ship-files=${flinkExtraJars.mkString(";")}"
+ buffer += s"-Dyarn.application.name=${conf.getOption(APP_KEY).get}"
+ buffer += s"-Dyarn.tags=${conf.getOption(YARN_TAG_KEY).get}"
+ buffer += "-Dcontainerized.master.env.FLINK_CONF_DIR=."
+
+ hiveConfDirOpt.foreach { _ =>
+ buffer += "-Dcontainerized.master.env.HIVE_CONF_DIR=."
}
- }).head.getAbsolutePath
- classpathEntries.add(flinkSqlClientPath)
-
- // jars from flink lib
- classpathEntries.add(s"$flinkHome${File.separator}lib${File.separator}*")
-
- // classpath contains flink configurations, default to flink.home/conf
- classpathEntries.add(env.getOrElse("FLINK_CONF_DIR", s"$flinkHome${File.separator}conf"))
- // classpath contains hadoop configurations
- env.get("HADOOP_CONF_DIR").foreach(classpathEntries.add)
- env.get("YARN_CONF_DIR").foreach(classpathEntries.add)
- env.get("HBASE_CONF_DIR").foreach(classpathEntries.add)
- val hadoopCp = env.get(FLINK_HADOOP_CLASSPATH_KEY)
- hadoopCp.foreach(classpathEntries.add)
- val extraCp = conf.get(ENGINE_FLINK_EXTRA_CLASSPATH)
- extraCp.foreach(classpathEntries.add)
- if (hadoopCp.isEmpty && extraCp.isEmpty) {
- warn(s"The conf of ${FLINK_HADOOP_CLASSPATH_KEY} and ${ENGINE_FLINK_EXTRA_CLASSPATH.key}" +
- s" is empty.")
- debug("Detected development environment")
- mainResource.foreach { path =>
- val devHadoopJars = Paths.get(path).getParent
- .resolve(s"scala-$SCALA_COMPILE_VERSION")
- .resolve("jars")
- if (!Files.exists(devHadoopJars)) {
- throw new KyuubiException(s"The path $devHadoopJars does not exists. " +
- s"Please set ${FLINK_HADOOP_CLASSPATH_KEY} or ${ENGINE_FLINK_EXTRA_CLASSPATH.key} " +
- s"for configuring location of hadoop client jars, etc")
+
+ val customFlinkConf = conf.getAllWithPrefix("flink", "")
+ customFlinkConf.filter(_._1 != "app.name").foreach { case (k, v) =>
+ buffer += s"-D$k=$v"
}
- classpathEntries.add(s"$devHadoopJars${File.separator}*")
- }
- }
- buffer += classpathEntries.asScala.mkString(File.pathSeparator)
- buffer += mainClass
- buffer += "--conf"
- buffer += s"$KYUUBI_SESSION_USER_KEY=$proxyUser"
+ buffer += "-c"
+ buffer += s"$mainClass"
+ buffer += s"${mainResource.get}"
+
+ buffer += "--conf"
+ buffer += s"$KYUUBI_SESSION_USER_KEY=$proxyUser"
+ conf.getAll.foreach { case (k, v) =>
+ if (k.startsWith("kyuubi.")) {
+ buffer += "--conf"
+ buffer += s"$k=$v"
+ }
+ }
+
+ buffer.toArray
+
+ case _ =>
+ val buffer = new ArrayBuffer[String]()
+ buffer += executable
+
+ val memory = conf.get(ENGINE_FLINK_MEMORY)
+ buffer += s"-Xmx$memory"
+ val javaOptions = conf.get(ENGINE_FLINK_JAVA_OPTIONS)
+ if (javaOptions.isDefined) {
+ buffer += javaOptions.get
+ }
+
+ buffer += "-cp"
+ val classpathEntries = new java.util.LinkedHashSet[String]
+ // flink engine runtime jar
+ mainResource.foreach(classpathEntries.add)
+ // flink sql jars
+ Paths.get(flinkHome)
+ .resolve("opt")
+ .toFile
+ .listFiles(new FilenameFilter {
+ override def accept(dir: File, name: String): Boolean = {
+ name.toLowerCase.startsWith("flink-sql-client") ||
+ name.toLowerCase.startsWith("flink-sql-gateway")
+ }
+ }).sorted.foreach(jar => classpathEntries.add(jar.getAbsolutePath))
+
+ // jars from flink lib
+ classpathEntries.add(s"$flinkHome${File.separator}lib${File.separator}*")
+
+ // classpath contains flink configurations, default to flink.home/conf
+ classpathEntries.add(env.getOrElse("FLINK_CONF_DIR", s"$flinkHome${File.separator}conf"))
+ // classpath contains hadoop configurations
+ env.get("HADOOP_CONF_DIR").foreach(classpathEntries.add)
+ env.get("YARN_CONF_DIR").foreach(classpathEntries.add)
+ env.get("HBASE_CONF_DIR").foreach(classpathEntries.add)
+ env.get("HIVE_CONF_DIR").foreach(classpathEntries.add)
+ val hadoopCp = env.get(FLINK_HADOOP_CLASSPATH_KEY)
+ hadoopCp.foreach(classpathEntries.add)
+ val extraCp = conf.get(ENGINE_FLINK_EXTRA_CLASSPATH)
+ extraCp.foreach(classpathEntries.add)
+ if (hadoopCp.isEmpty && extraCp.isEmpty) {
+ warn(s"The conf of ${FLINK_HADOOP_CLASSPATH_KEY} and " +
+ s"${ENGINE_FLINK_EXTRA_CLASSPATH.key} is empty.")
+ debug("Detected development environment.")
+ mainResource.foreach { path =>
+ val devHadoopJars = Paths.get(path).getParent
+ .resolve(s"scala-$SCALA_COMPILE_VERSION")
+ .resolve("jars")
+ if (!Files.exists(devHadoopJars)) {
+ throw new KyuubiException(s"The path $devHadoopJars does not exists. " +
+ s"Please set ${FLINK_HADOOP_CLASSPATH_KEY} or ${ENGINE_FLINK_EXTRA_CLASSPATH.key}" +
+ s" for configuring location of hadoop client jars, etc.")
+ }
+ classpathEntries.add(s"$devHadoopJars${File.separator}*")
+ }
+ }
+ buffer += classpathEntries.asScala.mkString(File.pathSeparator)
+ buffer += mainClass
- for ((k, v) <- conf.getAll) {
- buffer += "--conf"
- buffer += s"$k=$v"
+ buffer += "--conf"
+ buffer += s"$KYUUBI_SESSION_USER_KEY=$proxyUser"
+
+ conf.getAll.foreach { case (k, v) =>
+ buffer += "--conf"
+ buffer += s"$k=$v"
+ }
+ buffer.toArray
}
- buffer.toArray
}
override def shortName: String = "flink"
}
object FlinkProcessBuilder {
- final val APP_KEY = "yarn.application.name"
- final val TAG_KEY = "yarn.tags"
+ final val FLINK_EXEC_FILE = "flink"
+ final val APP_KEY = "flink.app.name"
+ final val YARN_TAG_KEY = "yarn.tags"
final val FLINK_HADOOP_CLASSPATH_KEY = "FLINK_HADOOP_CLASSPATH"
final val FLINK_PROXY_USER_KEY = "HADOOP_PROXY_USER"
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/hive/HiveProcessBuilder.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/hive/HiveProcessBuilder.scala
index e86597c5cc4..61fe55887ea 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/hive/HiveProcessBuilder.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/hive/HiveProcessBuilder.scala
@@ -29,7 +29,7 @@ import com.google.common.annotations.VisibleForTesting
import org.apache.kyuubi._
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.config.KyuubiConf.{ENGINE_HIVE_EXTRA_CLASSPATH, ENGINE_HIVE_JAVA_OPTIONS, ENGINE_HIVE_MEMORY}
-import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY
+import org.apache.kyuubi.config.KyuubiReservedKeys.{KYUUBI_ENGINE_ID, KYUUBI_SESSION_USER_KEY}
import org.apache.kyuubi.engine.{KyuubiApplicationManager, ProcBuilder}
import org.apache.kyuubi.engine.hive.HiveProcessBuilder._
import org.apache.kyuubi.operation.log.OperationLog
@@ -106,6 +106,8 @@ class HiveProcessBuilder(
buffer += "--conf"
buffer += s"$KYUUBI_SESSION_USER_KEY=$proxyUser"
+ buffer += "--conf"
+ buffer += s"$KYUUBI_ENGINE_ID=$engineRefId"
for ((k, v) <- conf.getAll) {
buffer += "--conf"
@@ -121,4 +123,5 @@ class HiveProcessBuilder(
object HiveProcessBuilder {
final val HIVE_HADOOP_CLASSPATH_KEY = "HIVE_HADOOP_CLASSPATH"
+ final val HIVE_ENGINE_NAME = "hive.engine.name"
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkBatchProcessBuilder.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkBatchProcessBuilder.scala
index 4a613278dcb..ef159bb93ad 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkBatchProcessBuilder.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkBatchProcessBuilder.scala
@@ -36,7 +36,7 @@ class SparkBatchProcessBuilder(
extends SparkProcessBuilder(proxyUser, conf, batchId, extraEngineLog) {
import SparkProcessBuilder._
- override protected val commands: Array[String] = {
+ override protected lazy val commands: Array[String] = {
val buffer = new ArrayBuffer[String]()
buffer += executable
Option(mainClass).foreach { cla =>
@@ -51,7 +51,10 @@ class SparkBatchProcessBuilder(
// tag batch application
KyuubiApplicationManager.tagApplication(batchId, "spark", clusterManager(), batchKyuubiConf)
- (batchKyuubiConf.getAll ++ sparkAppNameConf()).foreach { case (k, v) =>
+ (batchKyuubiConf.getAll ++
+ sparkAppNameConf() ++
+ engineLogPathConf() ++
+ appendPodNameConf(batchConf)).foreach { case (k, v) =>
buffer += CONF
buffer += s"${convertConfigKey(k)}=$v"
}
@@ -77,4 +80,12 @@ class SparkBatchProcessBuilder(
override def clusterManager(): Option[String] = {
batchConf.get(MASTER_KEY).orElse(super.clusterManager())
}
+
+ override def kubernetesContext(): Option[String] = {
+ batchConf.get(KUBERNETES_CONTEXT_KEY).orElse(super.kubernetesContext())
+ }
+
+ override def kubernetesNamespace(): Option[String] = {
+ batchConf.get(KUBERNETES_NAMESPACE_KEY).orElse(super.kubernetesNamespace())
+ }
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkProcessBuilder.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkProcessBuilder.scala
index b74eab77d05..351eddb7567 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkProcessBuilder.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkProcessBuilder.scala
@@ -19,7 +19,9 @@ package org.apache.kyuubi.engine.spark
import java.io.{File, IOException}
import java.nio.file.Paths
+import java.util.Locale
+import scala.collection.mutable
import scala.collection.mutable.ArrayBuffer
import com.google.common.annotations.VisibleForTesting
@@ -27,11 +29,13 @@ import org.apache.hadoop.security.UserGroupInformation
import org.apache.kyuubi._
import org.apache.kyuubi.config.KyuubiConf
-import org.apache.kyuubi.engine.{KyuubiApplicationManager, ProcBuilder}
+import org.apache.kyuubi.engine.{ApplicationManagerInfo, KyuubiApplicationManager, ProcBuilder}
import org.apache.kyuubi.engine.KubernetesApplicationOperation.{KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT}
+import org.apache.kyuubi.engine.ProcBuilder.KYUUBI_ENGINE_LOG_PATH_KEY
import org.apache.kyuubi.ha.HighAvailabilityConf
import org.apache.kyuubi.ha.client.AuthTypes
import org.apache.kyuubi.operation.log.OperationLog
+import org.apache.kyuubi.util.KubernetesUtils
import org.apache.kyuubi.util.Validator
class SparkProcessBuilder(
@@ -98,7 +102,7 @@ class SparkProcessBuilder(
}
}
- override protected val commands: Array[String] = {
+ override protected lazy val commands: Array[String] = {
// complete `spark.master` if absent on kubernetes
completeMasterUrl(conf)
@@ -115,8 +119,8 @@ class SparkProcessBuilder(
== AuthTypes.KERBEROS) {
allConf = allConf ++ zkAuthKeytabFileConf(allConf)
}
-
- allConf.foreach { case (k, v) =>
+ // pass spark engine log path to spark conf
+ (allConf ++ engineLogPathConf ++ appendPodNameConf(allConf)).foreach { case (k, v) =>
buffer += CONF
buffer += s"${convertConfigKey(k)}=$v"
}
@@ -183,26 +187,69 @@ class SparkProcessBuilder(
override def shortName: String = "spark"
- protected lazy val defaultMaster: Option[String] = {
+ protected lazy val defaultsConf: Map[String, String] = {
val confDir = env.getOrElse(SPARK_CONF_DIR, s"$sparkHome${File.separator}conf")
- val defaults =
- try {
- val confFile = new File(s"$confDir${File.separator}$SPARK_CONF_FILE_NAME")
- if (confFile.exists()) {
- Utils.getPropertiesFromFile(Some(confFile))
- } else {
- Map.empty[String, String]
+ try {
+ val confFile = new File(s"$confDir${File.separator}$SPARK_CONF_FILE_NAME")
+ if (confFile.exists()) {
+ Utils.getPropertiesFromFile(Some(confFile))
+ } else {
+ Map.empty[String, String]
+ }
+ } catch {
+ case _: Exception =>
+ warn(s"Failed to load spark configurations from $confDir")
+ Map.empty[String, String]
+ }
+ }
+
+ override def appMgrInfo(): ApplicationManagerInfo = {
+ ApplicationManagerInfo(
+ clusterManager(),
+ kubernetesContext(),
+ kubernetesNamespace())
+ }
+
+ def appendPodNameConf(conf: Map[String, String]): Map[String, String] = {
+ val appName = conf.getOrElse(APP_KEY, "spark")
+ val map = mutable.Map.newBuilder[String, String]
+ if (clusterManager().exists(cm => cm.toLowerCase(Locale.ROOT).startsWith("k8s"))) {
+ if (!conf.contains(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)) {
+ val prefix = KubernetesUtils.generateExecutorPodNamePrefix(appName, engineRefId)
+ map += (KUBERNETES_EXECUTOR_POD_NAME_PREFIX -> prefix)
+ }
+ if (deployMode().exists(_.toLowerCase(Locale.ROOT) == "cluster")) {
+ if (!conf.contains(KUBERNETES_DRIVER_POD_NAME)) {
+ val name = KubernetesUtils.generateDriverPodName(appName, engineRefId)
+ map += (KUBERNETES_DRIVER_POD_NAME -> name)
}
- } catch {
- case _: Exception =>
- warn(s"Failed to load spark configurations from $confDir")
- Map.empty[String, String]
}
- defaults.get(MASTER_KEY)
+ }
+ map.result().toMap
}
override def clusterManager(): Option[String] = {
- conf.getOption(MASTER_KEY).orElse(defaultMaster)
+ conf.getOption(MASTER_KEY).orElse(defaultsConf.get(MASTER_KEY))
+ }
+
+ def deployMode(): Option[String] = {
+ conf.getOption(DEPLOY_MODE_KEY).orElse(defaultsConf.get(DEPLOY_MODE_KEY))
+ }
+
+ override def isClusterMode(): Boolean = {
+ clusterManager().map(_.toLowerCase(Locale.ROOT)) match {
+ case Some(m) if m.startsWith("yarn") || m.startsWith("k8s") =>
+ deployMode().exists(_.toLowerCase(Locale.ROOT) == "cluster")
+ case _ => false
+ }
+ }
+
+ def kubernetesContext(): Option[String] = {
+ conf.getOption(KUBERNETES_CONTEXT_KEY).orElse(defaultsConf.get(KUBERNETES_CONTEXT_KEY))
+ }
+
+ def kubernetesNamespace(): Option[String] = {
+ conf.getOption(KUBERNETES_NAMESPACE_KEY).orElse(defaultsConf.get(KUBERNETES_NAMESPACE_KEY))
}
override def validateConf: Unit = Validator.validateConf(conf)
@@ -218,12 +265,21 @@ class SparkProcessBuilder(
}
}
}
+
+ private[spark] def engineLogPathConf(): Map[String, String] = {
+ Map(KYUUBI_ENGINE_LOG_PATH_KEY -> engineLog.getAbsolutePath)
+ }
}
object SparkProcessBuilder {
final val APP_KEY = "spark.app.name"
final val TAG_KEY = "spark.yarn.tags"
final val MASTER_KEY = "spark.master"
+ final val DEPLOY_MODE_KEY = "spark.submit.deployMode"
+ final val KUBERNETES_CONTEXT_KEY = "spark.kubernetes.context"
+ final val KUBERNETES_NAMESPACE_KEY = "spark.kubernetes.namespace"
+ final val KUBERNETES_DRIVER_POD_NAME = "spark.kubernetes.driver.pod.name"
+ final val KUBERNETES_EXECUTOR_POD_NAME_PREFIX = "spark.kubernetes.executor.podNamePrefix"
final val INTERNAL_RESOURCE = "spark-internal"
/**
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/trino/TrinoProcessBuilder.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/trino/TrinoProcessBuilder.scala
index 7b68e464aa9..041219dd0fb 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/trino/TrinoProcessBuilder.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/trino/TrinoProcessBuilder.scala
@@ -27,6 +27,7 @@ import scala.collection.mutable.ArrayBuffer
import com.google.common.annotations.VisibleForTesting
import org.apache.kyuubi.{Logging, SCALA_COMPILE_VERSION, Utils}
+import org.apache.kyuubi.Utils.REDACTION_REPLACEMENT_TEXT
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.config.KyuubiConf._
import org.apache.kyuubi.config.KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY
@@ -108,5 +109,19 @@ class TrinoProcessBuilder(
override def shortName: String = "trino"
- override def toString: String = Utils.redactCommandLineArgs(conf, commands).mkString("\n")
+ override def toString: String = {
+ if (commands == null) {
+ super.toString()
+ } else {
+ Utils.redactCommandLineArgs(conf, commands).map {
+ case arg if arg.contains(ENGINE_TRINO_CONNECTION_PASSWORD.key) =>
+ s"${ENGINE_TRINO_CONNECTION_PASSWORD.key}=$REDACTION_REPLACEMENT_TEXT"
+ case arg if arg.contains(ENGINE_TRINO_CONNECTION_KEYSTORE_PASSWORD.key) =>
+ s"${ENGINE_TRINO_CONNECTION_KEYSTORE_PASSWORD.key}=$REDACTION_REPLACEMENT_TEXT"
+ case arg if arg.contains(ENGINE_TRINO_CONNECTION_TRUSTSTORE_PASSWORD.key) =>
+ s"${ENGINE_TRINO_CONNECTION_TRUSTSTORE_PASSWORD.key}=$REDACTION_REPLACEMENT_TEXT"
+ case arg => arg
+ }.mkString("\n")
+ }
+ }
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/events/KyuubiOperationEvent.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/events/KyuubiOperationEvent.scala
index 7147cb42450..2a103213e5a 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/events/KyuubiOperationEvent.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/events/KyuubiOperationEvent.scala
@@ -43,6 +43,7 @@ import org.apache.kyuubi.session.KyuubiSession
* @param sessionUser the authenticated client user
* @param sessionType the type of the parent session
* @param kyuubiInstance the parent session connection url
+ * @param metrics the operation metrics
*/
case class KyuubiOperationEvent private (
statementId: String,
@@ -58,7 +59,8 @@ case class KyuubiOperationEvent private (
sessionId: String,
sessionUser: String,
sessionType: String,
- kyuubiInstance: String) extends KyuubiEvent {
+ kyuubiInstance: String,
+ metrics: Map[String, String]) extends KyuubiEvent {
// operation events are partitioned by the date when the corresponding operations are
// created.
@@ -88,6 +90,7 @@ object KyuubiOperationEvent {
session.handle.identifier.toString,
session.user,
session.sessionType.toString,
- session.connectionUrl)
+ session.connectionUrl,
+ operation.metrics)
}
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/events/ServerEventHandlerRegister.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/events/ServerEventHandlerRegister.scala
index 4ddee48ddfd..ca6c776ac8c 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/events/ServerEventHandlerRegister.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/events/ServerEventHandlerRegister.scala
@@ -19,8 +19,9 @@ package org.apache.kyuubi.events
import java.net.InetAddress
import org.apache.kyuubi.config.KyuubiConf
-import org.apache.kyuubi.config.KyuubiConf.{SERVER_EVENT_JSON_LOG_PATH, SERVER_EVENT_LOGGERS}
-import org.apache.kyuubi.events.handler.{EventHandler, ServerJsonLoggingEventHandler}
+import org.apache.kyuubi.config.KyuubiConf._
+import org.apache.kyuubi.events.handler.{EventHandler, ServerJsonLoggingEventHandler, ServerKafkaLoggingEventHandler}
+import org.apache.kyuubi.events.handler.ServerKafkaLoggingEventHandler.KAFKA_SERVER_EVENT_HANDLER_PREFIX
import org.apache.kyuubi.util.KyuubiHadoopUtils
object ServerEventHandlerRegister extends EventHandlerRegister {
@@ -36,6 +37,22 @@ object ServerEventHandlerRegister extends EventHandlerRegister {
kyuubiConf)
}
+ override def createKafkaEventHandler(kyuubiConf: KyuubiConf): EventHandler[KyuubiEvent] = {
+ val topic = kyuubiConf.get(SERVER_EVENT_KAFKA_TOPIC).getOrElse {
+ throw new IllegalArgumentException(s"${SERVER_EVENT_KAFKA_TOPIC.key} must be configured")
+ }
+ val closeTimeoutInMs = kyuubiConf.get(SERVER_EVENT_KAFKA_CLOSE_TIMEOUT)
+ val kafkaEventHandlerProducerConf =
+ kyuubiConf.getAllWithPrefix(KAFKA_SERVER_EVENT_HANDLER_PREFIX, "")
+ .filterKeys(
+ !List(SERVER_EVENT_KAFKA_TOPIC, SERVER_EVENT_KAFKA_CLOSE_TIMEOUT).map(_.key).contains(_))
+ ServerKafkaLoggingEventHandler(
+ topic,
+ kafkaEventHandlerProducerConf,
+ kyuubiConf,
+ closeTimeoutInMs)
+ }
+
override protected def getLoggers(conf: KyuubiConf): Seq[String] = {
conf.get(SERVER_EVENT_LOGGERS)
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/events/handler/ServerKafkaLoggingEventHandler.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/events/handler/ServerKafkaLoggingEventHandler.scala
new file mode 100644
index 00000000000..08f8b0d7944
--- /dev/null
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/events/handler/ServerKafkaLoggingEventHandler.scala
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.events.handler
+
+import org.apache.kyuubi.config.KyuubiConf
+
+case class ServerKafkaLoggingEventHandler(
+ topic: String,
+ producerConf: Iterable[(String, String)],
+ kyuubiConf: KyuubiConf,
+ closeTimeoutInMs: Long)
+ extends KafkaLoggingEventHandler(topic, producerConf, kyuubiConf, closeTimeoutInMs)
+
+object ServerKafkaLoggingEventHandler {
+ val KAFKA_SERVER_EVENT_HANDLER_PREFIX = "kyuubi.backend.server.event.kafka"
+}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/BatchJobSubmission.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/BatchJobSubmission.scala
index 3cbb16907bc..779dc48ae6a 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/BatchJobSubmission.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/BatchJobSubmission.scala
@@ -26,7 +26,7 @@ import com.codahale.metrics.MetricRegistry
import com.google.common.annotations.VisibleForTesting
import org.apache.hive.service.rpc.thrift._
-import org.apache.kyuubi.{KyuubiException, KyuubiSQLException}
+import org.apache.kyuubi.{KyuubiException, KyuubiSQLException, Utils}
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.engine.{ApplicationInfo, ApplicationState, KillResponse, ProcBuilder}
import org.apache.kyuubi.engine.spark.SparkBatchProcessBuilder
@@ -36,7 +36,7 @@ import org.apache.kyuubi.operation.FetchOrientation.FetchOrientation
import org.apache.kyuubi.operation.OperationState.{isTerminal, CANCELED, OperationState, RUNNING}
import org.apache.kyuubi.operation.log.OperationLog
import org.apache.kyuubi.server.metadata.api.Metadata
-import org.apache.kyuubi.session.KyuubiBatchSessionImpl
+import org.apache.kyuubi.session.KyuubiBatchSession
/**
* The state of batch operation is special. In general, the lifecycle of state is:
@@ -51,14 +51,14 @@ import org.apache.kyuubi.session.KyuubiBatchSessionImpl
* user close the batch session that means the final status is CANCELED.
*/
class BatchJobSubmission(
- session: KyuubiBatchSessionImpl,
+ session: KyuubiBatchSession,
val batchType: String,
val batchName: String,
resource: String,
className: String,
batchConf: Map[String, String],
batchArgs: Seq[String],
- recoveryMetadata: Option[Metadata])
+ metadata: Option[Metadata])
extends KyuubiApplicationOperation(session) {
import BatchJobSubmission._
@@ -71,16 +71,16 @@ class BatchJobSubmission(
private[kyuubi] val batchId: String = session.handle.identifier.toString
@volatile private var _applicationInfo: Option[ApplicationInfo] = None
- def getOrFetchCurrentApplicationInfo: Option[ApplicationInfo] = _applicationInfo match {
- case Some(_) => _applicationInfo
- case None => currentApplicationInfo
- }
+ def getApplicationInfo: Option[ApplicationInfo] = _applicationInfo
private var killMessage: KillResponse = (false, "UNKNOWN")
def getKillMessage: KillResponse = killMessage
- @volatile private var _appStartTime = recoveryMetadata.map(_.engineOpenTime).getOrElse(0L)
+ @volatile private var _appStartTime = metadata.map(_.engineOpenTime).getOrElse(0L)
def appStartTime: Long = _appStartTime
+ def appStarted: Boolean = _appStartTime > 0
+
+ private lazy val _submitTime = if (appStarted) _appStartTime else System.currentTimeMillis
@VisibleForTesting
private[kyuubi] val builder: ProcBuilder = {
@@ -102,20 +102,17 @@ class BatchJobSubmission(
}
}
- override protected def currentApplicationInfo: Option[ApplicationInfo] = {
- if (isTerminal(state) && _applicationInfo.nonEmpty) return _applicationInfo
- // only the ApplicationInfo with non-empty id is valid for the operation
- val submitTime = if (_appStartTime <= 0) {
- System.currentTimeMillis()
- } else {
- _appStartTime
+ override def currentApplicationInfo(): Option[ApplicationInfo] = {
+ if (isTerminal(state) && _applicationInfo.map(_.state).exists(ApplicationState.isTerminated)) {
+ return _applicationInfo
}
val applicationInfo =
applicationManager.getApplicationInfo(
- builder.clusterManager(),
+ builder.appMgrInfo(),
batchId,
- Some(submitTime)).filter(_.id != null)
- applicationInfo.foreach { _ =>
+ Some(session.user),
+ Some(_submitTime))
+ applicationId(applicationInfo).foreach { _ =>
if (_appStartTime <= 0) {
_appStartTime = System.currentTimeMillis()
}
@@ -123,8 +120,12 @@ class BatchJobSubmission(
applicationInfo
}
+ private def applicationId(applicationInfo: Option[ApplicationInfo]): Option[String] = {
+ applicationInfo.filter(_.id != null).map(_.id).orElse(None)
+ }
+
private[kyuubi] def killBatchApplication(): KillResponse = {
- applicationManager.killApplication(builder.clusterManager(), batchId)
+ applicationManager.killApplication(builder.appMgrInfo(), batchId, Some(session.user))
}
private val applicationCheckInterval =
@@ -132,31 +133,26 @@ class BatchJobSubmission(
private val applicationStarvationTimeout =
session.sessionConf.get(KyuubiConf.BATCH_APPLICATION_STARVATION_TIMEOUT)
+ private val applicationStartupDestroyTimeout =
+ session.sessionConf.get(KyuubiConf.SESSION_ENGINE_STARTUP_DESTROY_TIMEOUT)
+
private def updateBatchMetadata(): Unit = {
- val endTime =
- if (isTerminalState(state)) {
- lastAccessTime
- } else {
- 0L
- }
+ val endTime = if (isTerminalState(state)) lastAccessTime else 0L
- if (isTerminalState(state)) {
- if (_applicationInfo.isEmpty) {
- _applicationInfo =
- Option(ApplicationInfo(id = null, name = null, state = ApplicationState.NOT_FOUND))
- }
+ if (isTerminalState(state) && _applicationInfo.isEmpty) {
+ _applicationInfo = Some(ApplicationInfo.NOT_FOUND)
}
- _applicationInfo.foreach { status =>
+ _applicationInfo.foreach { appInfo =>
val metadataToUpdate = Metadata(
identifier = batchId,
state = state.toString,
engineOpenTime = appStartTime,
- engineId = status.id,
- engineName = status.name,
- engineUrl = status.url.orNull,
- engineState = status.state.toString,
- engineError = status.error,
+ engineId = appInfo.id,
+ engineName = appInfo.name,
+ engineUrl = appInfo.url.orNull,
+ engineState = appInfo.state.toString,
+ engineError = appInfo.error,
endTime = endTime)
session.sessionManager.updateMetadata(metadataToUpdate)
}
@@ -165,11 +161,11 @@ class BatchJobSubmission(
override def getOperationLog: Option[OperationLog] = Option(_operationLog)
// we can not set to other state if it is canceled
- private def setStateIfNotCanceled(newState: OperationState): Unit = state.synchronized {
+ private def setStateIfNotCanceled(newState: OperationState): Unit = withLockRequired {
if (state != CANCELED) {
setState(newState)
- _applicationInfo.filter(_.id != null).foreach { ai =>
- session.getSessionEvent.foreach(_.engineId = ai.id)
+ applicationId(_applicationInfo).foreach { appId =>
+ session.getSessionEvent.foreach(_.engineId = appId)
}
if (newState == RUNNING) {
session.onEngineOpened()
@@ -190,31 +186,27 @@ class BatchJobSubmission(
override protected def runInternal(): Unit = session.handleSessionException {
val asyncOperation: Runnable = () => {
try {
- if (recoveryMetadata.exists(_.peerInstanceClosed)) {
- setState(OperationState.CANCELED)
- } else {
- // If it is in recovery mode, only re-submit batch job if previous state is PENDING and
- // fail to fetch the status including appId from resource manager. Otherwise, monitor the
- // submitted batch application.
- recoveryMetadata.map { metadata =>
- if (metadata.state == OperationState.PENDING.toString) {
- _applicationInfo = currentApplicationInfo
- _applicationInfo.map(_.id) match {
- case Some(null) =>
- submitAndMonitorBatchJob()
- case Some(appId) =>
- monitorBatchJob(appId)
- case None =>
- submitAndMonitorBatchJob()
- }
- } else {
- monitorBatchJob(metadata.engineId)
+ metadata match {
+ case Some(metadata) if metadata.peerInstanceClosed =>
+ setState(OperationState.CANCELED)
+ case Some(metadata) if metadata.state == OperationState.PENDING.toString =>
+ // case 1: new batch job created using batch impl v2
+ // case 2: batch job from recovery, do submission only when previous state is
+ // PENDING and fail to fetch the status by appId from resource manager, which
+ // is similar with case 1; otherwise, monitor the submitted batch application.
+ _applicationInfo = currentApplicationInfo()
+ applicationId(_applicationInfo) match {
+ case None => submitAndMonitorBatchJob()
+ case Some(appId) => monitorBatchJob(appId)
}
- }.getOrElse {
+ case Some(metadata) =>
+ // batch job from recovery which was submitted
+ monitorBatchJob(metadata.engineId)
+ case None =>
+ // brand-new job created using batch impl v1
submitAndMonitorBatchJob()
- }
- setStateIfNotCanceled(OperationState.FINISHED)
}
+ setStateIfNotCanceled(OperationState.FINISHED)
} catch {
onError()
} finally {
@@ -240,10 +232,11 @@ class BatchJobSubmission(
try {
info(s"Submitting $batchType batch[$batchId] job:\n$builder")
val process = builder.start
- _applicationInfo = currentApplicationInfo
while (!applicationFailed(_applicationInfo) && process.isAlive) {
+ updateApplicationInfoMetadataIfNeeded()
if (!appStatusFirstUpdated) {
- if (_applicationInfo.isDefined) {
+ // only the ApplicationInfo with non-empty id indicates that batch is RUNNING
+ if (applicationId(_applicationInfo).isDefined) {
setStateIfNotCanceled(OperationState.RUNNING)
updateBatchMetadata()
appStatusFirstUpdated = true
@@ -257,25 +250,41 @@ class BatchJobSubmission(
}
}
process.waitFor(applicationCheckInterval, TimeUnit.MILLISECONDS)
- _applicationInfo = currentApplicationInfo
}
if (applicationFailed(_applicationInfo)) {
- process.destroyForcibly()
- throw new RuntimeException(s"Batch job failed: ${_applicationInfo}")
+ Utils.terminateProcess(process, applicationStartupDestroyTimeout)
+ throw new KyuubiException(s"Batch job failed: ${_applicationInfo}")
} else {
process.waitFor()
if (process.exitValue() != 0) {
throw new KyuubiException(s"Process exit with value ${process.exitValue()}")
}
- Option(_applicationInfo.map(_.id)).foreach {
+ while (!appStarted && applicationId(_applicationInfo).isEmpty &&
+ !applicationTerminated(_applicationInfo)) {
+ Thread.sleep(applicationCheckInterval)
+ updateApplicationInfoMetadataIfNeeded()
+ }
+
+ applicationId(_applicationInfo) match {
case Some(appId) => monitorBatchJob(appId)
- case _ =>
+ case None if !appStarted =>
+ throw new KyuubiException(s"$batchType batch[$batchId] job failed: ${_applicationInfo}")
+ case None =>
}
}
} finally {
- builder.close()
+ val waitCompletion = batchConf.get(KyuubiConf.SESSION_ENGINE_STARTUP_WAIT_COMPLETION.key)
+ .map(_.toBoolean).getOrElse(
+ session.sessionConf.get(KyuubiConf.SESSION_ENGINE_STARTUP_WAIT_COMPLETION))
+ val destroyProcess = !waitCompletion && builder.isClusterMode()
+ if (destroyProcess) {
+ info("Destroy the builder process because waitCompletion is false" +
+ " and the engine is running in cluster mode.")
+ }
+ builder.close(destroyProcess)
+ updateApplicationInfoMetadataIfNeeded()
cleanupUploadedResourceIfNeeded()
}
}
@@ -283,30 +292,37 @@ class BatchJobSubmission(
private def monitorBatchJob(appId: String): Unit = {
info(s"Monitoring submitted $batchType batch[$batchId] job: $appId")
if (_applicationInfo.isEmpty) {
- _applicationInfo = currentApplicationInfo
+ _applicationInfo = currentApplicationInfo()
}
if (state == OperationState.PENDING) {
setStateIfNotCanceled(OperationState.RUNNING)
}
if (_applicationInfo.isEmpty) {
info(s"The $batchType batch[$batchId] job: $appId not found, assume that it has finished.")
- } else if (applicationFailed(_applicationInfo)) {
- throw new RuntimeException(s"$batchType batch[$batchId] job failed: ${_applicationInfo}")
- } else {
- updateBatchMetadata()
- // TODO: add limit for max batch job submission lifetime
- while (_applicationInfo.isDefined && !applicationTerminated(_applicationInfo)) {
- Thread.sleep(applicationCheckInterval)
- val newApplicationStatus = currentApplicationInfo
- if (newApplicationStatus.map(_.state) != _applicationInfo.map(_.state)) {
- _applicationInfo = newApplicationStatus
- updateBatchMetadata()
- info(s"Batch report for $batchId, ${_applicationInfo}")
- }
- }
+ return
+ }
+ if (applicationFailed(_applicationInfo)) {
+ throw new KyuubiException(s"$batchType batch[$batchId] job failed: ${_applicationInfo}")
+ }
+ updateBatchMetadata()
+ // TODO: add limit for max batch job submission lifetime
+ while (_applicationInfo.isDefined && !applicationTerminated(_applicationInfo)) {
+ Thread.sleep(applicationCheckInterval)
+ updateApplicationInfoMetadataIfNeeded()
+ }
+ if (applicationFailed(_applicationInfo)) {
+ throw new KyuubiException(s"$batchType batch[$batchId] job failed: ${_applicationInfo}")
+ }
+ }
- if (applicationFailed(_applicationInfo)) {
- throw new RuntimeException(s"$batchType batch[$batchId] job failed: ${_applicationInfo}")
+ private def updateApplicationInfoMetadataIfNeeded(): Unit = {
+ if (applicationId(_applicationInfo).isEmpty ||
+ !_applicationInfo.map(_.state).exists(ApplicationState.isTerminated)) {
+ val newApplicationStatus = currentApplicationInfo()
+ if (newApplicationStatus.map(_.state) != _applicationInfo.map(_.state)) {
+ _applicationInfo = newApplicationStatus
+ updateBatchMetadata()
+ info(s"Batch report for $batchId, ${_applicationInfo}")
}
}
}
@@ -321,7 +337,7 @@ class BatchJobSubmission(
}
}
- override def close(): Unit = state.synchronized {
+ override def close(): Unit = withLockRequired {
if (!isClosedOrCanceled) {
try {
getOperationLog.foreach(_.close())
@@ -334,14 +350,14 @@ class BatchJobSubmission(
// fast fail
if (isTerminalState(state)) {
killMessage = (false, s"batch $batchId is already terminal so can not kill it.")
- builder.close()
+ builder.close(true)
cleanupUploadedResourceIfNeeded()
return
}
try {
killMessage = killBatchApplication()
- builder.close()
+ builder.close(true)
cleanupUploadedResourceIfNeeded()
} finally {
if (state == OperationState.INITIALIZED) {
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/ExecutedCommandExec.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/ExecutedCommandExec.scala
index 98065b8cbaf..70b727e5e67 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/ExecutedCommandExec.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/ExecutedCommandExec.scala
@@ -17,7 +17,7 @@
package org.apache.kyuubi.operation
-import org.apache.hive.service.rpc.thrift.{TGetResultSetMetadataResp, TRowSet}
+import org.apache.hive.service.rpc.thrift.{TFetchResultsResp, TGetResultSetMetadataResp}
import org.apache.kyuubi.operation.FetchOrientation.FetchOrientation
import org.apache.kyuubi.operation.log.OperationLog
@@ -67,11 +67,17 @@ class ExecutedCommandExec(
if (!shouldRunAsync) getBackgroundHandle.get()
}
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
setHasResultSet(true)
- command.getNextRowSet(order, rowSetSize, getProtocolVersion)
+ val rowSet = command.getNextRowSet(order, rowSetSize, getProtocolVersion)
+ val resp = new TFetchResultsResp(OK_STATUS)
+ resp.setResults(rowSet)
+ resp.setHasMoreRows(false)
+ resp
}
override def getResultSetMetadata: TGetResultSetMetadataResp = {
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiApplicationOperation.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiApplicationOperation.scala
index 605c4cca6b8..93929c59cce 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiApplicationOperation.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiApplicationOperation.scala
@@ -22,7 +22,7 @@ import java.util.{ArrayList => JArrayList}
import scala.collection.JavaConverters._
-import org.apache.hive.service.rpc.thrift.{TColumn, TColumnDesc, TGetResultSetMetadataResp, TPrimitiveTypeEntry, TRow, TRowSet, TStringColumn, TTableSchema, TTypeDesc, TTypeEntry, TTypeId}
+import org.apache.hive.service.rpc.thrift.{TColumn, TColumnDesc, TFetchResultsResp, TGetResultSetMetadataResp, TPrimitiveTypeEntry, TRow, TRowSet, TStringColumn, TTableSchema, TTypeDesc, TTypeEntry, TTypeId}
import org.apache.kyuubi.engine.ApplicationInfo
import org.apache.kyuubi.operation.FetchOrientation.FetchOrientation
@@ -31,10 +31,10 @@ import org.apache.kyuubi.util.ThriftUtils
abstract class KyuubiApplicationOperation(session: Session) extends KyuubiOperation(session) {
- protected def currentApplicationInfo: Option[ApplicationInfo]
+ protected def currentApplicationInfo(): Option[ApplicationInfo]
protected def applicationInfoMap: Option[Map[String, String]] = {
- currentApplicationInfo.map(_.toMap)
+ currentApplicationInfo().map(_.toMap)
}
override def getResultSetMetadata: TGetResultSetMetadataResp = {
@@ -54,8 +54,11 @@ abstract class KyuubiApplicationOperation(session: Session) extends KyuubiOperat
resp
}
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
- applicationInfoMap.map { state =>
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
+ val resp = new TFetchResultsResp(OK_STATUS)
+ val rowSet = applicationInfoMap.map { state =>
val tRow = new TRowSet(0, new JArrayList[TRow](state.size))
Seq(state.keys, state.values.map(Option(_).getOrElse(""))).map(_.toSeq.asJava).foreach {
col =>
@@ -64,5 +67,8 @@ abstract class KyuubiApplicationOperation(session: Session) extends KyuubiOperat
}
tRow
}.getOrElse(ThriftUtils.EMPTY_ROW_SET)
+ resp.setResults(rowSet)
+ resp.setHasMoreRows(false)
+ resp
}
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperation.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperation.scala
index 106a11e4b25..83e19cb6579 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperation.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperation.scala
@@ -53,13 +53,28 @@ abstract class KyuubiOperation(session: Session) extends AbstractOperation(sessi
def remoteOpHandle(): TOperationHandle = _remoteOpHandle
+ @volatile protected var _fetchLogCount = 0L
+ @volatile protected var _fetchResultsCount = 0L
+
+ protected[kyuubi] def increaseFetchLogCount(count: Int): Unit = {
+ _fetchLogCount += count
+ }
+
+ protected[kyuubi] def increaseFetchResultsCount(count: Int): Unit = {
+ _fetchResultsCount += count
+ }
+
+ def metrics: Map[String, String] = Map(
+ "fetchLogCount" -> _fetchLogCount.toString,
+ "fetchResultsCount" -> _fetchResultsCount.toString)
+
protected def verifyTStatus(tStatus: TStatus): Unit = {
ThriftUtils.verifyTStatus(tStatus)
}
protected def onError(action: String = "operating"): PartialFunction[Throwable, Unit] = {
case e: Throwable =>
- state.synchronized {
+ withLockRequired {
if (isTerminalState(state)) {
warn(s"Ignore exception in terminal state with $statementId", e)
} else {
@@ -101,14 +116,14 @@ abstract class KyuubiOperation(session: Session) extends AbstractOperation(sessi
}
override protected def afterRun(): Unit = {
- state.synchronized {
+ withLockRequired {
if (!isTerminalState(state)) {
setState(OperationState.FINISHED)
}
}
}
- override def cancel(): Unit = state.synchronized {
+ override def cancel(): Unit = withLockRequired {
if (!isClosedOrCanceled) {
setState(OperationState.CANCELED)
MetricsSystem.tracing(_.decCount(MetricRegistry.name(OPERATION_OPEN, opType)))
@@ -123,17 +138,10 @@ abstract class KyuubiOperation(session: Session) extends AbstractOperation(sessi
}
}
- override def close(): Unit = state.synchronized {
+ override def close(): Unit = withLockRequired {
if (!isClosedOrCanceled) {
setState(OperationState.CLOSED)
MetricsSystem.tracing(_.decCount(MetricRegistry.name(OPERATION_OPEN, opType)))
- try {
- // For launch engine operation, we use OperationLog to pass engine submit log but
- // at that time we do not have remoteOpHandle
- getOperationLog.foreach(_.close())
- } catch {
- case e: IOException => error(e.getMessage, e)
- }
if (_remoteOpHandle != null) {
try {
client.closeOperation(_remoteOpHandle)
@@ -143,6 +151,13 @@ abstract class KyuubiOperation(session: Session) extends AbstractOperation(sessi
}
}
}
+ try {
+ // For launch engine operation, we use OperationLog to pass engine submit log but
+ // at that time we do not have remoteOpHandle
+ getOperationLog.foreach(_.close())
+ } catch {
+ case e: IOException => error(e.getMessage, e)
+ }
}
override def getResultSetMetadata: TGetResultSetMetadataResp = {
@@ -164,11 +179,17 @@ abstract class KyuubiOperation(session: Session) extends AbstractOperation(sessi
}
}
- override def getNextRowSet(order: FetchOrientation, rowSetSize: Int): TRowSet = {
+ override def getNextRowSetInternal(
+ order: FetchOrientation,
+ rowSetSize: Int): TFetchResultsResp = {
validateDefaultFetchOrientation(order)
assertState(OperationState.FINISHED)
setHasResultSet(true)
- client.fetchResults(_remoteOpHandle, order, rowSetSize, fetchLog = false)
+ val rowset = client.fetchResults(_remoteOpHandle, order, rowSetSize, fetchLog = false)
+ val resp = new TFetchResultsResp(OK_STATUS)
+ resp.setResults(rowset)
+ resp.setHasMoreRows(false)
+ resp
}
override def shouldRunAsync: Boolean = false
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperationManager.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperationManager.scala
index dd4889653cf..739c99cd78a 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperationManager.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperationManager.scala
@@ -19,7 +19,7 @@ package org.apache.kyuubi.operation
import java.util.concurrent.TimeUnit
-import org.apache.hive.service.rpc.thrift.TRowSet
+import org.apache.hive.service.rpc.thrift.{TFetchResultsResp, TStatus, TStatusCode}
import org.apache.kyuubi.KyuubiSQLException
import org.apache.kyuubi.config.KyuubiConf
@@ -28,7 +28,7 @@ import org.apache.kyuubi.metrics.MetricsConstants.OPERATION_OPEN
import org.apache.kyuubi.metrics.MetricsSystem
import org.apache.kyuubi.operation.FetchOrientation.FetchOrientation
import org.apache.kyuubi.server.metadata.api.Metadata
-import org.apache.kyuubi.session.{KyuubiBatchSessionImpl, KyuubiSessionImpl, Session}
+import org.apache.kyuubi.session.{KyuubiBatchSession, KyuubiSessionImpl, Session}
import org.apache.kyuubi.sql.plan.command.RunnableCommand
import org.apache.kyuubi.util.ThriftUtils
@@ -74,14 +74,14 @@ class KyuubiOperationManager private (name: String) extends OperationManager(nam
}
def newBatchJobSubmissionOperation(
- session: KyuubiBatchSessionImpl,
+ session: KyuubiBatchSession,
batchType: String,
batchName: String,
resource: String,
className: String,
batchConf: Map[String, String],
batchArgs: Seq[String],
- recoveryMetadata: Option[Metadata]): BatchJobSubmission = {
+ metadata: Option[Metadata]): BatchJobSubmission = {
val operation = new BatchJobSubmission(
session,
batchType,
@@ -90,7 +90,7 @@ class KyuubiOperationManager private (name: String) extends OperationManager(nam
className,
batchConf,
batchArgs,
- recoveryMetadata)
+ metadata)
addOperation(operation)
operation
}
@@ -212,12 +212,12 @@ class KyuubiOperationManager private (name: String) extends OperationManager(nam
override def getOperationLogRowSet(
opHandle: OperationHandle,
order: FetchOrientation,
- maxRows: Int): TRowSet = {
-
+ maxRows: Int): TFetchResultsResp = {
+ val resp = new TFetchResultsResp(new TStatus(TStatusCode.SUCCESS_STATUS))
val operation = getOperation(opHandle).asInstanceOf[KyuubiOperation]
val operationLog = operation.getOperationLog
- operationLog match {
- case Some(log) => log.read(maxRows)
+ val rowSet = operationLog match {
+ case Some(log) => log.read(order, maxRows)
case None =>
val remoteHandle = operation.remoteOpHandle()
val client = operation.client
@@ -227,6 +227,9 @@ class KyuubiOperationManager private (name: String) extends OperationManager(nam
ThriftUtils.EMPTY_ROW_SET
}
}
+ resp.setResults(rowSet)
+ resp.setHasMoreRows(false)
+ resp
}
override def start(): Unit = synchronized {
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/LaunchEngine.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/LaunchEngine.scala
index fb4f39e262b..758dccb9d1b 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/LaunchEngine.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/LaunchEngine.scala
@@ -33,7 +33,7 @@ class LaunchEngine(session: KyuubiSessionImpl, override val shouldRunAsync: Bool
}
override def getOperationLog: Option[OperationLog] = Option(_operationLog)
- override protected def currentApplicationInfo: Option[ApplicationInfo] = {
+ override protected def currentApplicationInfo(): Option[ApplicationInfo] = {
Option(client).map { cli =>
ApplicationInfo(
cli.engineId.orNull,
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/plugin/PluginLoader.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/plugin/PluginLoader.scala
index 17ad6952425..da4c8e4a9d1 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/plugin/PluginLoader.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/plugin/PluginLoader.scala
@@ -21,6 +21,7 @@ import scala.util.control.NonFatal
import org.apache.kyuubi.KyuubiException
import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.util.reflect.DynConstructors
private[kyuubi] object PluginLoader {
@@ -31,8 +32,7 @@ private[kyuubi] object PluginLoader {
}
try {
- Class.forName(advisorClass.get).getConstructor().newInstance()
- .asInstanceOf[SessionConfAdvisor]
+ DynConstructors.builder.impl(advisorClass.get).buildChecked[SessionConfAdvisor].newInstance()
} catch {
case _: ClassCastException =>
throw new KyuubiException(
@@ -45,8 +45,7 @@ private[kyuubi] object PluginLoader {
def loadGroupProvider(conf: KyuubiConf): GroupProvider = {
val groupProviderClass = conf.get(KyuubiConf.GROUP_PROVIDER)
try {
- Class.forName(groupProviderClass).getConstructor().newInstance()
- .asInstanceOf[GroupProvider]
+ DynConstructors.builder().impl(groupProviderClass).buildChecked[GroupProvider]().newInstance()
} catch {
case _: ClassCastException =>
throw new KyuubiException(
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/BackendServiceMetric.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/BackendServiceMetric.scala
index 68bf11d7f99..9da4b78c036 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/BackendServiceMetric.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/BackendServiceMetric.scala
@@ -20,7 +20,7 @@ package org.apache.kyuubi.server
import org.apache.hive.service.rpc.thrift._
import org.apache.kyuubi.metrics.{MetricsConstants, MetricsSystem}
-import org.apache.kyuubi.operation.{OperationHandle, OperationStatus}
+import org.apache.kyuubi.operation.{KyuubiOperation, OperationHandle, OperationStatus}
import org.apache.kyuubi.operation.FetchOrientation.FetchOrientation
import org.apache.kyuubi.service.BackendService
import org.apache.kyuubi.session.SessionHandle
@@ -183,9 +183,10 @@ trait BackendServiceMetric extends BackendService {
operationHandle: OperationHandle,
orientation: FetchOrientation,
maxRows: Int,
- fetchLog: Boolean): TRowSet = {
+ fetchLog: Boolean): TFetchResultsResp = {
MetricsSystem.timerTracing(MetricsConstants.BS_FETCH_RESULTS) {
- val rowSet = super.fetchResults(operationHandle, orientation, maxRows, fetchLog)
+ val fetchResultsResp = super.fetchResults(operationHandle, orientation, maxRows, fetchLog)
+ val rowSet = fetchResultsResp.getResults
// TODO: the statistics are wrong when we enabled the arrow.
val rowsSize =
if (rowSet.getColumnsSize > 0) {
@@ -207,7 +208,17 @@ trait BackendServiceMetric extends BackendService {
else MetricsConstants.BS_FETCH_RESULT_ROWS_RATE,
rowsSize))
- rowSet
+ val operation = sessionManager.operationManager
+ .getOperation(operationHandle)
+ .asInstanceOf[KyuubiOperation]
+
+ if (fetchLog) {
+ operation.increaseFetchLogCount(rowsSize)
+ } else {
+ operation.increaseFetchResultsCount(rowsSize)
+ }
+
+ fetchResultsResp
}
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiBatchService.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiBatchService.scala
new file mode 100644
index 00000000000..2bfbbce2ab7
--- /dev/null
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiBatchService.scala
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.server
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.kyuubi.config.KyuubiConf.BATCH_SUBMITTER_THREADS
+import org.apache.kyuubi.engine.ApplicationState
+import org.apache.kyuubi.operation.OperationState
+import org.apache.kyuubi.server.metadata.MetadataManager
+import org.apache.kyuubi.service.{AbstractService, Serverable}
+import org.apache.kyuubi.session.KyuubiSessionManager
+import org.apache.kyuubi.util.ThreadUtils
+
+class KyuubiBatchService(
+ server: Serverable,
+ sessionManager: KyuubiSessionManager)
+ extends AbstractService(classOf[KyuubiBatchService].getSimpleName) {
+
+ private lazy val restFrontend = server.frontendServices
+ .filter(_.isInstanceOf[KyuubiRestFrontendService])
+ .head
+
+ private def kyuubiInstance: String = restFrontend.connectionUrl
+
+ // TODO expose metrics, including pending/running/succeeded/failed batches
+ // TODO handle dangling batches, e.g. batch is picked and changed state to pending,
+ // but the Server crashed before submitting or updating status to metastore
+
+ private lazy val metadataManager: MetadataManager = sessionManager.metadataManager.get
+ private val running: AtomicBoolean = new AtomicBoolean(false)
+ private lazy val batchExecutor = ThreadUtils
+ .newDaemonFixedThreadPool(conf.get(BATCH_SUBMITTER_THREADS), "kyuubi-batch-submitter")
+
+ def cancelUnscheduledBatch(batchId: String): Boolean = {
+ metadataManager.cancelUnscheduledBatch(batchId)
+ }
+
+ def countBatch(
+ batchType: String,
+ batchUser: Option[String],
+ batchState: Option[String] = None,
+ kyuubiInstance: Option[String] = None): Int = {
+ metadataManager.countBatch(
+ batchType,
+ batchUser.orNull,
+ batchState.orNull,
+ kyuubiInstance.orNull)
+ }
+
+ override def start(): Unit = {
+ assert(running.compareAndSet(false, true))
+ val submitTask: Runnable = () => {
+ while (running.get) {
+ metadataManager.pickBatchForSubmitting(kyuubiInstance) match {
+ case None => Thread.sleep(1000)
+ case Some(metadata) =>
+ val batchId = metadata.identifier
+ info(s"$batchId is picked for submission.")
+ val batchSession = sessionManager.createBatchSession(
+ metadata.username,
+ "anonymous",
+ metadata.ipAddress,
+ metadata.requestConf,
+ metadata.engineType,
+ Option(metadata.requestName),
+ metadata.resource,
+ metadata.className,
+ metadata.requestArgs,
+ Some(metadata),
+ fromRecovery = false)
+ sessionManager.openBatchSession(batchSession)
+ var submitted = false
+ while (!submitted) { // block until batch job submitted
+ submitted = metadataManager.getBatchSessionMetadata(batchId) match {
+ case Some(metadata) if OperationState.isTerminal(metadata.opState) =>
+ true
+ case Some(metadata) if metadata.opState == OperationState.RUNNING =>
+ metadata.appState match {
+ // app that is not submitted to resource manager
+ case None | Some(ApplicationState.NOT_FOUND) => false
+ // app that is pending in resource manager
+ case Some(ApplicationState.PENDING) => false
+ // not sure, added for safe
+ case Some(ApplicationState.UNKNOWN) => false
+ case _ => true
+ }
+ case Some(_) =>
+ false
+ case None =>
+ error(s"$batchId does not existed in metastore, assume it is finished")
+ true
+ }
+ if (!submitted) Thread.sleep(1000)
+ }
+ info(s"$batchId is submitted or finished.")
+ }
+ }
+ }
+ (0 until batchExecutor.getCorePoolSize).foreach(_ => batchExecutor.submit(submitTask))
+ super.start()
+ }
+
+ override def stop(): Unit = {
+ super.stop()
+ if (running.compareAndSet(true, false)) {
+ ThreadUtils.shutdown(batchExecutor)
+ }
+ }
+}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiMySQLFrontendService.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiMySQLFrontendService.scala
index 96a2114aa95..1a449dde4f1 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiMySQLFrontendService.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiMySQLFrontendService.scala
@@ -94,7 +94,10 @@ class KyuubiMySQLFrontendService(override val serverable: Serverable)
override def connectionUrl: String = {
checkInitialized()
- s"${serverAddr.getCanonicalHostName}:$port"
+ conf.get(FRONTEND_ADVERTISED_HOST) match {
+ case Some(advertisedHost) => s"$advertisedHost:$port"
+ case None => s"${serverAddr.getCanonicalHostName}:$port"
+ }
}
override def start(): Unit = synchronized {
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiRestFrontendService.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiRestFrontendService.scala
index cd191afe834..28dfab731fd 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiRestFrontendService.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiRestFrontendService.scala
@@ -52,7 +52,7 @@ class KyuubiRestFrontendService(override val serverable: Serverable)
private def hadoopConf: Configuration = KyuubiServer.getHadoopConf()
- private def sessionManager = be.sessionManager.asInstanceOf[KyuubiSessionManager]
+ private[kyuubi] def sessionManager = be.sessionManager.asInstanceOf[KyuubiSessionManager]
private val batchChecker = ThreadUtils.newDaemonSingleThreadScheduledExecutor("batch-checker")
@@ -68,19 +68,24 @@ class KyuubiRestFrontendService(override val serverable: Serverable)
}
}
+ private lazy val port: Int = conf.get(FRONTEND_REST_BIND_PORT)
+
override def initialize(conf: KyuubiConf): Unit = synchronized {
this.conf = conf
server = JettyServer(
getName,
host,
- conf.get(FRONTEND_REST_BIND_PORT),
+ port,
conf.get(FRONTEND_REST_MAX_WORKER_THREADS))
super.initialize(conf)
}
override def connectionUrl: String = {
checkInitialized()
- server.getServerUri
+ conf.get(FRONTEND_ADVERTISED_HOST) match {
+ case Some(advertisedHost) => s"$advertisedHost:$port"
+ case None => server.getServerUri
+ }
}
private def startInternal(): Unit = {
@@ -90,6 +95,9 @@ class KyuubiRestFrontendService(override val serverable: Serverable)
val authenticationFactory = new KyuubiHttpAuthenticationFactory(conf)
server.addHandler(authenticationFactory.httpHandlerWrapperFactory.wrapHandler(contextHandler))
+ val proxyHandler = ApiRootResource.getEngineUIProxyHandler(this)
+ server.addHandler(authenticationFactory.httpHandlerWrapperFactory.wrapHandler(proxyHandler))
+
server.addStaticHandler("org/apache/kyuubi/ui/static", "/static/")
server.addRedirectHandler("/", "/static/")
server.addRedirectHandler("/static", "/static/")
@@ -120,7 +128,7 @@ class KyuubiRestFrontendService(override val serverable: Serverable)
sessionManager.getPeerInstanceClosedBatchSessions(connectionUrl).foreach { batch =>
Utils.tryLogNonFatalError {
val sessionHandle = SessionHandle.fromUUID(batch.identifier)
- Option(sessionManager.getBatchSessionImpl(sessionHandle)).foreach(_.close())
+ sessionManager.getBatchSession(sessionHandle).foreach(_.close())
}
}
} catch {
@@ -175,10 +183,16 @@ class KyuubiRestFrontendService(override val serverable: Serverable)
if (!isStarted.get) {
try {
server.start()
- recoverBatchSessions()
isStarted.set(true)
startBatchChecker()
startInternal()
+ // block until the HTTP server is started, otherwise, we may get
+ // the wrong HTTP server port -1
+ while (server.getState != "STARTED") {
+ info(s"Waiting for $getName's HTTP server getting started")
+ Thread.sleep(1000)
+ }
+ recoverBatchSessions()
} catch {
case e: Exception => throw new KyuubiException(s"Cannot start $getName", e)
}
@@ -232,7 +246,9 @@ class KyuubiRestFrontendService(override val serverable: Serverable)
realUser
} else {
sessionConf.get(KyuubiAuthenticationFactory.HS2_PROXY_USER).map { proxyUser =>
- KyuubiAuthenticationFactory.verifyProxyAccess(realUser, proxyUser, ipAddress, hadoopConf)
+ if (!getConf.get(KyuubiConf.SERVER_ADMINISTRATORS).contains(realUser)) {
+ KyuubiAuthenticationFactory.verifyProxyAccess(realUser, proxyUser, ipAddress, hadoopConf)
+ }
proxyUser
}.getOrElse(realUser)
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiServer.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiServer.scala
index a7f2e817837..453ae0b7904 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiServer.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiServer.scala
@@ -25,7 +25,7 @@ import org.apache.hadoop.security.UserGroupInformation
import org.apache.kyuubi._
import org.apache.kyuubi.config.KyuubiConf
-import org.apache.kyuubi.config.KyuubiConf.{FRONTEND_PROTOCOLS, FrontendProtocols}
+import org.apache.kyuubi.config.KyuubiConf.{BATCH_SUBMITTER_ENABLED, FRONTEND_PROTOCOLS, FrontendProtocols, KYUUBI_KUBERNETES_CONF_PREFIX}
import org.apache.kyuubi.config.KyuubiConf.FrontendProtocols._
import org.apache.kyuubi.events.{EventBus, KyuubiServerInfoEvent, ServerEventHandlerRegister}
import org.apache.kyuubi.ha.HighAvailabilityConf._
@@ -38,17 +38,20 @@ import org.apache.kyuubi.util.{KyuubiHadoopUtils, SignalRegister}
import org.apache.kyuubi.zookeeper.EmbeddedZookeeper
object KyuubiServer extends Logging {
- private val zkServer = new EmbeddedZookeeper()
private[kyuubi] var kyuubiServer: KyuubiServer = _
@volatile private[kyuubi] var hadoopConf: Configuration = _
def startServer(conf: KyuubiConf): KyuubiServer = {
hadoopConf = KyuubiHadoopUtils.newHadoopConf(conf)
+ var embeddedZkServer: Option[EmbeddedZookeeper] = None
if (!ServiceDiscovery.supportServiceDiscovery(conf)) {
- zkServer.initialize(conf)
- zkServer.start()
- conf.set(HA_ADDRESSES, zkServer.getConnectString)
- conf.set(HA_ZK_AUTH_TYPE, AuthTypes.NONE.toString)
+ embeddedZkServer = Some(new EmbeddedZookeeper())
+ embeddedZkServer.foreach(zkServer => {
+ zkServer.initialize(conf)
+ zkServer.start()
+ conf.set(HA_ADDRESSES, zkServer.getConnectString)
+ conf.set(HA_ZK_AUTH_TYPE, AuthTypes.NONE.toString)
+ })
}
val server = conf.get(KyuubiConf.SERVER_NAME) match {
@@ -59,9 +62,7 @@ object KyuubiServer extends Logging {
server.initialize(conf)
} catch {
case e: Exception =>
- if (zkServer.getServiceState == ServiceState.STARTED) {
- zkServer.stop()
- }
+ embeddedZkServer.filter(_.getServiceState == ServiceState.STARTED).foreach(_.stop())
throw e
}
server.start()
@@ -111,14 +112,29 @@ object KyuubiServer extends Logging {
private[kyuubi] def refreshUserDefaultsConf(): Unit = kyuubiServer.conf.synchronized {
val existedUserDefaults = kyuubiServer.conf.getAllUserDefaults
val refreshedUserDefaults = KyuubiConf().loadFileDefaults().getAllUserDefaults
+ refreshConfig("user defaults", existedUserDefaults, refreshedUserDefaults)
+ }
+
+ private[kyuubi] def refreshKubernetesConf(): Unit = kyuubiServer.conf.synchronized {
+ val existedKubernetesConf =
+ kyuubiServer.conf.getAll.filter(_._1.startsWith(KYUUBI_KUBERNETES_CONF_PREFIX))
+ val refreshedKubernetesConf =
+ KyuubiConf().loadFileDefaults().getAll.filter(_._1.startsWith(KYUUBI_KUBERNETES_CONF_PREFIX))
+ refreshConfig("kubernetes", existedKubernetesConf, refreshedKubernetesConf)
+ }
+
+ private def refreshConfig(
+ configDomain: String,
+ existing: Map[String, String],
+ refreshed: Map[String, String]): Unit = {
var (unsetCount, updatedCount, addedCount) = (0, 0, 0)
- for ((k, _) <- existedUserDefaults if !refreshedUserDefaults.contains(k)) {
+ for ((k, _) <- existing if !refreshed.contains(k)) {
kyuubiServer.conf.unset(k)
unsetCount = unsetCount + 1
}
- for ((k, v) <- refreshedUserDefaults) {
- if (existedUserDefaults.contains(k)) {
- if (!StringUtils.equals(existedUserDefaults.get(k).orNull, v)) {
+ for ((k, v) <- refreshed) {
+ if (existing.contains(k)) {
+ if (!StringUtils.equals(existing.get(k).orNull, v)) {
updatedCount = updatedCount + 1
}
} else {
@@ -126,17 +142,25 @@ object KyuubiServer extends Logging {
}
kyuubiServer.conf.set(k, v)
}
- info(s"Refreshed user defaults configs with changes of " +
+ info(s"Refreshed $configDomain configs with changes of " +
s"unset: $unsetCount, updated: $updatedCount, added: $addedCount")
}
private[kyuubi] def refreshUnlimitedUsers(): Unit = synchronized {
val sessionMgr = kyuubiServer.backendService.sessionManager.asInstanceOf[KyuubiSessionManager]
- val existingUnlimitedUsers = sessionMgr.getUnlimitedUsers()
+ val existingUnlimitedUsers = sessionMgr.getUnlimitedUsers
sessionMgr.refreshUnlimitedUsers(KyuubiConf().loadFileDefaults())
- val refreshedUnlimitedUsers = sessionMgr.getUnlimitedUsers()
+ val refreshedUnlimitedUsers = sessionMgr.getUnlimitedUsers
info(s"Refreshed unlimited users from $existingUnlimitedUsers to $refreshedUnlimitedUsers")
}
+
+ private[kyuubi] def refreshDenyUsers(): Unit = synchronized {
+ val sessionMgr = kyuubiServer.backendService.sessionManager.asInstanceOf[KyuubiSessionManager]
+ val existingDenyUsers = sessionMgr.getDenyUsers
+ sessionMgr.refreshDenyUsers(KyuubiConf().loadFileDefaults())
+ val refreshedDenyUsers = sessionMgr.getDenyUsers
+ info(s"Refreshed deny users from $existingDenyUsers to $refreshedDenyUsers")
+ }
}
class KyuubiServer(name: String) extends Serverable(name) {
@@ -164,8 +188,6 @@ class KyuubiServer(name: String) extends Serverable(name) {
}
override def initialize(conf: KyuubiConf): Unit = synchronized {
- initLoggerEventHandler(conf)
-
val kinit = new KinitAuxiliaryService()
addService(kinit)
@@ -175,7 +197,15 @@ class KyuubiServer(name: String) extends Serverable(name) {
if (conf.get(MetricsConf.METRICS_ENABLED)) {
addService(new MetricsSystem)
}
+
+ if (conf.isRESTEnabled && conf.get(BATCH_SUBMITTER_ENABLED)) {
+ addService(new KyuubiBatchService(
+ this,
+ backendService.sessionManager.asInstanceOf[KyuubiSessionManager]))
+ }
super.initialize(conf)
+
+ initLoggerEventHandler(conf)
}
override def start(): Unit = {
@@ -193,5 +223,7 @@ class KyuubiServer(name: String) extends Serverable(name) {
ServerEventHandlerRegister.registerEventLoggers(conf)
}
- override protected def stopServer(): Unit = {}
+ override protected def stopServer(): Unit = {
+ EventBus.deregisterAll()
+ }
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiTHttpFrontendService.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiTHttpFrontendService.scala
index 63933aa7724..79351118c50 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiTHttpFrontendService.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiTHttpFrontendService.scala
@@ -278,7 +278,11 @@ final class KyuubiTHttpFrontendService(
val realUser = getShortName(Option(SessionManager.getUserName).getOrElse(req.getUsername))
// using the remote ip address instead of that in proxy http header for authentication
val ipAddress: String = SessionManager.getIpAddress
- val sessionUser: String = getProxyUser(req.getConfiguration, ipAddress, realUser)
+ val sessionUser: String = if (req.getConfiguration == null) {
+ realUser
+ } else {
+ getProxyUser(req.getConfiguration, ipAddress, realUser)
+ }
debug(s"Client's real user: $realUser, session user: $sessionUser")
realUser -> sessionUser
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiTrinoFrontendService.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiTrinoFrontendService.scala
index 573bb948f90..95f6d590265 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiTrinoFrontendService.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiTrinoFrontendService.scala
@@ -21,7 +21,7 @@ import java.util.concurrent.atomic.AtomicBoolean
import org.apache.kyuubi.{KyuubiException, Utils}
import org.apache.kyuubi.config.KyuubiConf
-import org.apache.kyuubi.config.KyuubiConf.{FRONTEND_TRINO_BIND_HOST, FRONTEND_TRINO_BIND_PORT, FRONTEND_TRINO_MAX_WORKER_THREADS}
+import org.apache.kyuubi.config.KyuubiConf.{FRONTEND_ADVERTISED_HOST, FRONTEND_TRINO_BIND_HOST, FRONTEND_TRINO_BIND_PORT, FRONTEND_TRINO_MAX_WORKER_THREADS}
import org.apache.kyuubi.server.trino.api.v1.ApiRootResource
import org.apache.kyuubi.server.ui.JettyServer
import org.apache.kyuubi.service.{AbstractFrontendService, Serverable, Service}
@@ -46,19 +46,24 @@ class KyuubiTrinoFrontendService(override val serverable: Serverable)
}
}
+ private lazy val port: Int = conf.get(FRONTEND_TRINO_BIND_PORT)
+
override def initialize(conf: KyuubiConf): Unit = synchronized {
this.conf = conf
server = JettyServer(
getName,
host,
- conf.get(FRONTEND_TRINO_BIND_PORT),
+ port,
conf.get(FRONTEND_TRINO_MAX_WORKER_THREADS))
super.initialize(conf)
}
override def connectionUrl: String = {
checkInitialized()
- server.getServerUri
+ conf.get(FRONTEND_ADVERTISED_HOST) match {
+ case Some(advertisedHost) => s"$advertisedHost:$port"
+ case None => server.getServerUri
+ }
}
private def startInternal(): Unit = {
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/ApiUtils.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/ApiUtils.scala
index ebbf04c9073..5aaf4d7780f 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/ApiUtils.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/ApiUtils.scala
@@ -19,13 +19,14 @@ package org.apache.kyuubi.server.api
import scala.collection.JavaConverters._
-import org.apache.kyuubi.Utils
-import org.apache.kyuubi.client.api.v1.dto.{OperationData, SessionData}
+import org.apache.kyuubi.{Logging, Utils}
+import org.apache.kyuubi.client.api.v1.dto.{OperationData, ServerData, SessionData}
import org.apache.kyuubi.events.KyuubiOperationEvent
+import org.apache.kyuubi.ha.client.ServiceNodeInfo
import org.apache.kyuubi.operation.KyuubiOperation
import org.apache.kyuubi.session.KyuubiSession
-object ApiUtils {
+object ApiUtils extends Logging {
def sessionData(session: KyuubiSession): SessionData = {
val sessionEvent = session.getSessionEvent
@@ -56,6 +57,23 @@ object ApiUtils {
opEvent.sessionId,
opEvent.sessionUser,
opEvent.sessionType,
- operation.getSession.asInstanceOf[KyuubiSession].connectionUrl)
+ operation.getSession.asInstanceOf[KyuubiSession].connectionUrl,
+ operation.metrics.asJava)
+ }
+
+ def serverData(nodeInfo: ServiceNodeInfo): ServerData = {
+ new ServerData(
+ nodeInfo.nodeName,
+ nodeInfo.namespace,
+ nodeInfo.instance,
+ nodeInfo.host,
+ nodeInfo.port,
+ nodeInfo.attributes.asJava,
+ "Running")
+ }
+
+ def logAndRefineErrorMsg(errorMsg: String, throwable: Throwable): String = {
+ error(errorMsg, throwable)
+ s"$errorMsg: ${Utils.prettyPrint(throwable)}"
}
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/EngineUIProxyServlet.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/EngineUIProxyServlet.scala
new file mode 100644
index 00000000000..021a2ad85ed
--- /dev/null
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/EngineUIProxyServlet.scala
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.server.api
+
+import java.net.URL
+import javax.servlet.http.HttpServletRequest
+
+import org.apache.commons.lang3.StringUtils
+import org.eclipse.jetty.client.api.Request
+import org.eclipse.jetty.proxy.ProxyServlet
+
+import org.apache.kyuubi.Logging
+
+private[api] class EngineUIProxyServlet extends ProxyServlet with Logging {
+
+ override def rewriteTarget(request: HttpServletRequest): String = {
+ val requestURL = request.getRequestURL
+ val requestURI = request.getRequestURI
+ var targetURL = "/no-ui-error"
+ extractTargetAddress(requestURI).foreach { case (host, port) =>
+ val targetURI = requestURI.stripPrefix(s"/engine-ui/$host:$port") match {
+ // for some reason, the proxy can not handle redirect well, as a workaround,
+ // we simulate the Spark UI redirection behavior and forcibly rewrite the
+ // empty URI to the Spark Jobs page.
+ case "" | "/" => "/jobs/"
+ case path => path
+ }
+ val targetQueryString =
+ Option(request.getQueryString).filter(StringUtils.isNotEmpty).map(q => s"?$q").getOrElse("")
+ targetURL = new URL("http", host, port, targetURI + targetQueryString).toString
+ }
+ debug(s"rewrite $requestURL => $targetURL")
+ targetURL
+ }
+
+ override def addXForwardedHeaders(
+ clientRequest: HttpServletRequest,
+ proxyRequest: Request): Unit = {
+ val requestURI = clientRequest.getRequestURI
+ extractTargetAddress(requestURI).foreach { case (host, port) =>
+ // SPARK-24209: Knox uses X-Forwarded-Context to notify the application the base path
+ proxyRequest.header("X-Forwarded-Context", s"/engine-ui/$host:$port")
+ }
+ super.addXForwardedHeaders(clientRequest, proxyRequest)
+ }
+
+ private val r = "^/engine-ui/([^/:]+):(\\d+)/?.*".r
+ private def extractTargetAddress(requestURI: String): Option[(String, Int)] =
+ requestURI match {
+ case r(host, port) => Some(host -> port.toInt)
+ case _ => None
+ }
+}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/KyuubiScalaObjectMapper.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/KyuubiScalaObjectMapper.scala
index 776c35ba731..724da120999 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/KyuubiScalaObjectMapper.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/KyuubiScalaObjectMapper.scala
@@ -19,11 +19,13 @@ package org.apache.kyuubi.server.api
import javax.ws.rs.ext.ContextResolver
-import com.fasterxml.jackson.databind.ObjectMapper
+import com.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper}
import com.fasterxml.jackson.module.scala.DefaultScalaModule
class KyuubiScalaObjectMapper extends ContextResolver[ObjectMapper] {
- private val mapper = new ObjectMapper().registerModule(DefaultScalaModule)
+ private val mapper = new ObjectMapper()
+ .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
+ .registerModule(DefaultScalaModule)
override def getContext(aClass: Class[_]): ObjectMapper = mapper
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/api.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/api.scala
index deadcf9abe4..93953a577dc 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/api.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/api.scala
@@ -25,7 +25,8 @@ import javax.ws.rs.ext.{ExceptionMapper, Provider}
import org.eclipse.jetty.server.handler.ContextHandler
-import org.apache.kyuubi.server.KyuubiRestFrontendService
+import org.apache.kyuubi.Logging
+import org.apache.kyuubi.server.{KyuubiBatchService, KyuubiRestFrontendService, KyuubiServer}
private[api] trait ApiRequestContext {
@@ -35,22 +36,28 @@ private[api] trait ApiRequestContext {
@Context
protected var httpRequest: HttpServletRequest = _
+ protected lazy val batchService: Option[KyuubiBatchService] =
+ KyuubiServer.kyuubiServer.getServices
+ .find(_.isInstanceOf[KyuubiBatchService])
+ .map(_.asInstanceOf[KyuubiBatchService])
+
final protected def fe: KyuubiRestFrontendService = FrontendServiceContext.get(servletContext)
}
@Provider
-class RestExceptionMapper extends ExceptionMapper[Exception] {
+class RestExceptionMapper extends ExceptionMapper[Exception] with Logging {
override def toResponse(exception: Exception): Response = {
+ warn("Error occurs on accessing REST API.", exception)
exception match {
case e: WebApplicationException =>
Response.status(e.getResponse.getStatus)
- .`type`(e.getResponse.getMediaType)
- .entity(e.getMessage)
+ .`type`(MediaType.APPLICATION_JSON)
+ .entity(Map("message" -> e.getMessage))
.build()
case e =>
Response.status(Response.Status.INTERNAL_SERVER_ERROR)
.`type`(MediaType.APPLICATION_JSON)
- .entity(e.getMessage)
+ .entity(Map("message" -> e.getMessage))
.build()
}
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/AdminResource.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/AdminResource.scala
index 0d8b31b2c65..0c2065ff1dd 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/AdminResource.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/AdminResource.scala
@@ -28,10 +28,9 @@ import io.swagger.v3.oas.annotations.media.{ArraySchema, Content, Schema}
import io.swagger.v3.oas.annotations.responses.ApiResponse
import io.swagger.v3.oas.annotations.tags.Tag
import org.apache.commons.lang3.StringUtils
-import org.apache.zookeeper.KeeperException.NoNodeException
import org.apache.kyuubi.{KYUUBI_VERSION, Logging, Utils}
-import org.apache.kyuubi.client.api.v1.dto.{Engine, OperationData, SessionData}
+import org.apache.kyuubi.client.api.v1.dto._
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.config.KyuubiConf._
import org.apache.kyuubi.ha.HighAvailabilityConf.HA_NAMESPACE
@@ -41,11 +40,12 @@ import org.apache.kyuubi.operation.{KyuubiOperation, OperationHandle}
import org.apache.kyuubi.server.KyuubiServer
import org.apache.kyuubi.server.api.{ApiRequestContext, ApiUtils}
import org.apache.kyuubi.session.{KyuubiSession, SessionHandle}
+import org.apache.kyuubi.shaded.zookeeper.KeeperException.NoNodeException
@Tag(name = "Admin")
@Produces(Array(MediaType.APPLICATION_JSON))
private[v1] class AdminResource extends ApiRequestContext with Logging {
- private lazy val administrators = fe.getConf.get(KyuubiConf.SERVER_ADMINISTRATORS).toSet +
+ private lazy val administrators = fe.getConf.get(KyuubiConf.SERVER_ADMINISTRATORS) +
Utils.currentUser
@ApiResponse(
@@ -87,6 +87,25 @@ private[v1] class AdminResource extends ApiRequestContext with Logging {
Response.ok(s"Refresh the user defaults conf successfully.").build()
}
+ @ApiResponse(
+ responseCode = "200",
+ content = Array(new Content(mediaType = MediaType.APPLICATION_JSON)),
+ description = "refresh the kubernetes configs")
+ @POST
+ @Path("refresh/kubernetes_conf")
+ def refreshKubernetesConf(): Response = {
+ val userName = fe.getSessionUser(Map.empty[String, String])
+ val ipAddress = fe.getIpAddress
+ info(s"Receive refresh kubernetes conf request from $userName/$ipAddress")
+ if (!isAdministrator(userName)) {
+ throw new NotAllowedException(
+ s"$userName is not allowed to refresh the kubernetes conf")
+ }
+ info(s"Reloading kubernetes conf")
+ KyuubiServer.refreshKubernetesConf()
+ Response.ok(s"Refresh the kubernetes conf successfully.").build()
+ }
+
@ApiResponse(
responseCode = "200",
content = Array(new Content(mediaType = MediaType.APPLICATION_JSON)),
@@ -106,6 +125,25 @@ private[v1] class AdminResource extends ApiRequestContext with Logging {
Response.ok(s"Refresh the unlimited users successfully.").build()
}
+ @ApiResponse(
+ responseCode = "200",
+ content = Array(new Content(mediaType = MediaType.APPLICATION_JSON)),
+ description = "refresh the deny users")
+ @POST
+ @Path("refresh/deny_users")
+ def refreshDenyUser(): Response = {
+ val userName = fe.getSessionUser(Map.empty[String, String])
+ val ipAddress = fe.getIpAddress
+ info(s"Receive refresh deny users request from $userName/$ipAddress")
+ if (!isAdministrator(userName)) {
+ throw new NotAllowedException(
+ s"$userName is not allowed to refresh the deny users")
+ }
+ info(s"Reloading deny users")
+ KyuubiServer.refreshDenyUsers()
+ Response.ok(s"Refresh the deny users successfully.").build()
+ }
+
@ApiResponse(
responseCode = "200",
content = Array(new Content(
@@ -127,9 +165,7 @@ private[v1] class AdminResource extends ApiRequestContext with Logging {
val usersSet = users.split(",").toSet
sessions = sessions.filter(session => usersSet.contains(session.user))
}
- sessions.map { case session =>
- ApiUtils.sessionData(session.asInstanceOf[KyuubiSession])
- }.toSeq
+ sessions.map(session => ApiUtils.sessionData(session.asInstanceOf[KyuubiSession])).toSeq
}
@ApiResponse(
@@ -259,7 +295,7 @@ private[v1] class AdminResource extends ApiRequestContext with Logging {
val engine = getEngine(userName, engineType, shareLevel, subdomain, "")
val engineSpace = getEngineSpace(engine)
- var engineNodes = ListBuffer[ServiceNodeInfo]()
+ val engineNodes = ListBuffer[ServiceNodeInfo]()
Option(subdomain).filter(_.nonEmpty) match {
case Some(_) =>
withDiscoveryClient(fe.getConf) { discoveryClient =>
@@ -294,6 +330,36 @@ private[v1] class AdminResource extends ApiRequestContext with Logging {
node.instance,
node.namespace,
node.attributes.asJava))
+ .toSeq
+ }
+
+ @ApiResponse(
+ responseCode = "200",
+ content = Array(
+ new Content(
+ mediaType = MediaType.APPLICATION_JSON,
+ array = new ArraySchema(schema = new Schema(implementation =
+ classOf[OperationData])))),
+ description = "list all live kyuubi servers")
+ @GET
+ @Path("server")
+ def listServers(): Seq[ServerData] = {
+ val userName = fe.getSessionUser(Map.empty[String, String])
+ val ipAddress = fe.getIpAddress
+ info(s"Received list all live kyuubi servers request from $userName/$ipAddress")
+ if (!isAdministrator(userName)) {
+ throw new NotAllowedException(
+ s"$userName is not allowed to list all live kyuubi servers")
+ }
+ val kyuubiConf = fe.getConf
+ val servers = ListBuffer[ServerData]()
+ val serverSpec = DiscoveryPaths.makePath(null, kyuubiConf.get(HA_NAMESPACE))
+ withDiscoveryClient(kyuubiConf) { discoveryClient =>
+ discoveryClient.getServiceNodesInfo(serverSpec).map(nodeInfo => {
+ servers += ApiUtils.serverData(nodeInfo)
+ })
+ }
+ servers.toSeq
}
private def getEngine(
@@ -326,13 +392,44 @@ private[v1] class AdminResource extends ApiRequestContext with Logging {
private def getEngineSpace(engine: Engine): String = {
val serverSpace = fe.getConf.get(HA_NAMESPACE)
+ val appUser = engine.getSharelevel match {
+ case "GROUP" =>
+ fe.sessionManager.groupProvider.primaryGroup(engine.getUser, fe.getConf.getAll.asJava)
+ case _ => engine.getUser
+ }
+
DiscoveryPaths.makePath(
s"${serverSpace}_${engine.getVersion}_${engine.getSharelevel}_${engine.getEngineType}",
- engine.getUser,
+ appUser,
engine.getSubdomain)
}
+ @ApiResponse(
+ responseCode = "200",
+ content = Array(new Content(
+ mediaType = MediaType.APPLICATION_JSON,
+ schema = new Schema(implementation = classOf[Count]))),
+ description = "get the batch count")
+ @GET
+ @Path("batch/count")
+ def countBatch(
+ @QueryParam("batchType") @DefaultValue("SPARK") batchType: String,
+ @QueryParam("batchUser") batchUser: String,
+ @QueryParam("batchState") batchState: String): Count = {
+ val userName = fe.getSessionUser(Map.empty[String, String])
+ val ipAddress = fe.getIpAddress
+ info(s"Received counting batches request from $userName/$ipAddress")
+ if (!isAdministrator(userName)) {
+ throw new NotAllowedException(
+ s"$userName is not allowed to count the batches")
+ }
+ val batchCount = batchService
+ .map(_.countBatch(batchType, Option(batchUser), Option(batchState)))
+ .getOrElse(0)
+ new Count(batchCount)
+ }
+
private def isAdministrator(userName: String): Boolean = {
- administrators.contains(userName);
+ administrators.contains(userName)
}
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/ApiRootResource.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/ApiRootResource.scala
index d8b997e865c..8abc23ff1bd 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/ApiRootResource.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/ApiRootResource.scala
@@ -30,7 +30,7 @@ import org.glassfish.jersey.servlet.ServletContainer
import org.apache.kyuubi.KYUUBI_VERSION
import org.apache.kyuubi.client.api.v1.dto._
import org.apache.kyuubi.server.KyuubiRestFrontendService
-import org.apache.kyuubi.server.api.{ApiRequestContext, FrontendServiceContext, OpenAPIConfig}
+import org.apache.kyuubi.server.api.{ApiRequestContext, EngineUIProxyServlet, FrontendServiceContext, OpenAPIConfig}
@Path("/v1")
private[v1] class ApiRootResource extends ApiRequestContext {
@@ -82,4 +82,13 @@ private[server] object ApiRootResource {
handler.addServlet(holder, "/*")
handler
}
+
+ def getEngineUIProxyHandler(fe: KyuubiRestFrontendService): ServletContextHandler = {
+ val proxyServlet = new EngineUIProxyServlet()
+ val holder = new ServletHolder(proxyServlet)
+ val proxyHandler = new ServletContextHandler(ServletContextHandler.NO_SESSIONS)
+ proxyHandler.setContextPath("/engine-ui")
+ proxyHandler.addServlet(holder, "/*")
+ proxyHandler
+ }
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/BatchesResource.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/BatchesResource.scala
index 4814996a4a1..76d913a98c7 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/BatchesResource.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/BatchesResource.scala
@@ -38,15 +38,16 @@ import org.apache.kyuubi.{Logging, Utils}
import org.apache.kyuubi.client.api.v1.dto._
import org.apache.kyuubi.client.exception.KyuubiRestException
import org.apache.kyuubi.client.util.BatchUtils._
-import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.config.KyuubiConf._
import org.apache.kyuubi.config.KyuubiReservedKeys._
-import org.apache.kyuubi.engine.{ApplicationInfo, KyuubiApplicationManager}
+import org.apache.kyuubi.engine.{ApplicationInfo, ApplicationManagerInfo, KillResponse, KyuubiApplicationManager}
import org.apache.kyuubi.operation.{BatchJobSubmission, FetchOrientation, OperationState}
+import org.apache.kyuubi.server.KyuubiServer
import org.apache.kyuubi.server.api.ApiRequestContext
import org.apache.kyuubi.server.api.v1.BatchesResource._
import org.apache.kyuubi.server.metadata.MetadataManager
-import org.apache.kyuubi.server.metadata.api.Metadata
-import org.apache.kyuubi.session.{KyuubiBatchSessionImpl, KyuubiSessionManager, SessionHandle}
+import org.apache.kyuubi.server.metadata.api.{Metadata, MetadataFilter}
+import org.apache.kyuubi.session.{KyuubiBatchSession, KyuubiSessionManager, SessionHandle, SessionType}
import org.apache.kyuubi.util.JdbcUtils
@Tag(name = "Batch")
@@ -54,45 +55,38 @@ import org.apache.kyuubi.util.JdbcUtils
private[v1] class BatchesResource extends ApiRequestContext with Logging {
private val internalRestClients = new ConcurrentHashMap[String, InternalRestClient]()
private lazy val internalSocketTimeout =
- fe.getConf.get(KyuubiConf.BATCH_INTERNAL_REST_CLIENT_SOCKET_TIMEOUT)
+ fe.getConf.get(BATCH_INTERNAL_REST_CLIENT_SOCKET_TIMEOUT).toInt
private lazy val internalConnectTimeout =
- fe.getConf.get(KyuubiConf.BATCH_INTERNAL_REST_CLIENT_CONNECT_TIMEOUT)
+ fe.getConf.get(BATCH_INTERNAL_REST_CLIENT_CONNECT_TIMEOUT).toInt
+
+ private def batchV2Enabled(reqConf: Map[String, String]): Boolean = {
+ KyuubiServer.kyuubiServer.getConf.get(BATCH_SUBMITTER_ENABLED) &&
+ reqConf.getOrElse(BATCH_IMPL_VERSION.key, fe.getConf.get(BATCH_IMPL_VERSION)) == "2"
+ }
private def getInternalRestClient(kyuubiInstance: String): InternalRestClient = {
internalRestClients.computeIfAbsent(
kyuubiInstance,
- kyuubiInstance => {
- new InternalRestClient(
- kyuubiInstance,
- internalSocketTimeout.toInt,
- internalConnectTimeout.toInt)
- })
+ k => new InternalRestClient(k, internalSocketTimeout, internalConnectTimeout))
}
private def sessionManager = fe.be.sessionManager.asInstanceOf[KyuubiSessionManager]
- private def buildBatch(session: KyuubiBatchSessionImpl): Batch = {
+ private def buildBatch(session: KyuubiBatchSession): Batch = {
val batchOp = session.batchJobSubmissionOp
val batchOpStatus = batchOp.getStatus
- val batchAppStatus = batchOp.getOrFetchCurrentApplicationInfo
-
- val name = Option(batchOp.batchName).getOrElse(batchAppStatus.map(_.name).orNull)
- var appId: String = null
- var appUrl: String = null
- var appState: String = null
- var appDiagnostic: String = null
-
- if (batchAppStatus.nonEmpty) {
- appId = batchAppStatus.get.id
- appUrl = batchAppStatus.get.url.orNull
- appState = batchAppStatus.get.state.toString
- appDiagnostic = batchAppStatus.get.error.orNull
- } else {
- val metadata = sessionManager.getBatchMetadata(batchOp.batchId)
- appId = metadata.engineId
- appUrl = metadata.engineUrl
- appState = metadata.engineState
- appDiagnostic = metadata.engineError.orNull
+
+ val (name, appId, appUrl, appState, appDiagnostic) = batchOp.getApplicationInfo.map { appInfo =>
+ val name = Option(batchOp.batchName).getOrElse(appInfo.name)
+ (name, appInfo.id, appInfo.url.orNull, appInfo.state.toString, appInfo.error.orNull)
+ }.getOrElse {
+ sessionManager.getBatchMetadata(batchOp.batchId) match {
+ case Some(batch) =>
+ val diagnostic = batch.engineError.orNull
+ (batchOp.batchName, batch.engineId, batch.engineUrl, batch.engineState, diagnostic)
+ case None =>
+ (batchOp.batchName, null, null, null, null)
+ }
}
new Batch(
@@ -185,8 +179,8 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
@FormDataParam("resourceFile") resourceFileInputStream: InputStream,
@FormDataParam("resourceFile") resourceFileMetadata: FormDataContentDisposition): Batch = {
require(
- fe.getConf.get(KyuubiConf.BATCH_RESOURCE_UPLOAD_ENABLED),
- "Batch resource upload function is not enabled.")
+ fe.getConf.get(BATCH_RESOURCE_UPLOAD_ENABLED),
+ "Batch resource upload function is disabled.")
require(
batchRequest != null,
"batchRequest is required and please check the content type" +
@@ -228,7 +222,7 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
}
userProvidedBatchId.flatMap { batchId =>
- Option(sessionManager.getBatchFromMetadataStore(batchId))
+ sessionManager.getBatchFromMetadataStore(batchId)
} match {
case Some(batch) =>
markDuplicated(batch)
@@ -245,20 +239,51 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
KYUUBI_SESSION_CONNECTION_URL_KEY -> fe.connectionUrl,
KYUUBI_SESSION_REAL_USER_KEY -> fe.getRealUser())).asJava)
+ if (batchV2Enabled(request.getConf.asScala.toMap)) {
+ logger.info(s"Submit batch job $batchId using Batch API v2")
+ return Try {
+ sessionManager.initializeBatchState(
+ userName,
+ ipAddress,
+ request.getConf.asScala.toMap,
+ request)
+ } match {
+ case Success(batchId) =>
+ sessionManager.getBatchFromMetadataStore(batchId) match {
+ case Some(batch) => batch
+ case None => throw new IllegalStateException(
+ s"can not find batch $batchId from metadata store")
+ }
+ case Failure(cause) if JdbcUtils.isDuplicatedKeyDBErr(cause) =>
+ sessionManager.getBatchFromMetadataStore(batchId) match {
+ case Some(batch) => markDuplicated(batch)
+ case None => throw new IllegalStateException(
+ s"can not find duplicated batch $batchId from metadata store")
+ }
+ case Failure(cause) => throw new IllegalStateException(cause)
+ }
+ }
+
Try {
sessionManager.openBatchSession(
userName,
"anonymous",
ipAddress,
- request.getConf.asScala.toMap,
request)
} match {
case Success(sessionHandle) =>
- buildBatch(sessionManager.getBatchSessionImpl(sessionHandle))
+ sessionManager.getBatchSession(sessionHandle) match {
+ case Some(batchSession) => buildBatch(batchSession)
+ case None => throw new IllegalStateException(
+ s"can not find batch $batchId from metadata store")
+ }
case Failure(cause) if JdbcUtils.isDuplicatedKeyDBErr(cause) =>
- val batch = sessionManager.getBatchFromMetadataStore(batchId)
- assert(batch != null, s"can not find duplicated batch $batchId from metadata store")
- markDuplicated(batch)
+ sessionManager.getBatchFromMetadataStore(batchId) match {
+ case Some(batch) => markDuplicated(batch)
+ case None => throw new IllegalStateException(
+ s"can not find duplicated batch $batchId from metadata store")
+ }
+ case Failure(cause) => throw new IllegalStateException(cause)
}
}
}
@@ -280,11 +305,12 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
def batchInfo(@PathParam("batchId") batchId: String): Batch = {
val userName = fe.getSessionUser(Map.empty[String, String])
val sessionHandle = formatSessionHandle(batchId)
- Option(sessionManager.getBatchSessionImpl(sessionHandle)).map { batchSession =>
+ sessionManager.getBatchSession(sessionHandle).map { batchSession =>
buildBatch(batchSession)
}.getOrElse {
- Option(sessionManager.getBatchMetadata(batchId)).map { metadata =>
- if (OperationState.isTerminal(OperationState.withName(metadata.state)) ||
+ sessionManager.getBatchMetadata(batchId).map { metadata =>
+ if (batchV2Enabled(metadata.requestConf) ||
+ OperationState.isTerminal(OperationState.withName(metadata.state)) ||
metadata.kyuubiInstance == fe.connectionUrl) {
MetadataManager.buildBatch(metadata)
} else {
@@ -295,9 +321,11 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
case e: KyuubiRestException =>
error(s"Error redirecting get batch[$batchId] to ${metadata.kyuubiInstance}", e)
val batchAppStatus = sessionManager.applicationManager.getApplicationInfo(
- metadata.clusterManager,
+ metadata.appMgrInfo,
batchId,
- Some(metadata.createTime))
+ Some(userName),
+ // prevent that the batch be marked as terminated if application state is NOT_FOUND
+ Some(metadata.engineOpenTime).filter(_ > 0).orElse(Some(System.currentTimeMillis)))
buildBatch(metadata, batchAppStatus)
}
}
@@ -320,6 +348,7 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
@QueryParam("batchType") batchType: String,
@QueryParam("batchState") batchState: String,
@QueryParam("batchUser") batchUser: String,
+ @QueryParam("batchName") batchName: String,
@QueryParam("createTime") createTime: Long,
@QueryParam("endTime") endTime: Long,
@QueryParam("from") from: Int,
@@ -332,15 +361,16 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
validBatchState(batchState),
s"The valid batch state can be one of the following: ${VALID_BATCH_STATES.mkString(",")}")
}
- val batches =
- sessionManager.getBatchesFromMetadataStore(
- batchType,
- batchUser,
- batchState,
- createTime,
- endTime,
- from,
- size)
+
+ val filter = MetadataFilter(
+ sessionType = SessionType.BATCH,
+ engineType = batchType,
+ username = batchUser,
+ state = batchState,
+ requestName = batchName,
+ createTime = createTime,
+ endTime = endTime)
+ val batches = sessionManager.getBatchesFromMetadataStore(filter, from, size)
new GetBatchesResponse(from, batches.size, batches.asJava)
}
@@ -358,7 +388,7 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
@QueryParam("size") @DefaultValue("100") size: Int): OperationLog = {
val userName = fe.getSessionUser(Map.empty[String, String])
val sessionHandle = formatSessionHandle(batchId)
- Option(sessionManager.getBatchSessionImpl(sessionHandle)).map { batchSession =>
+ sessionManager.getBatchSession(sessionHandle).map { batchSession =>
try {
val submissionOp = batchSession.batchJobSubmissionOp
val rowSet = submissionOp.getOperationLogRowSet(FetchOrientation.FETCH_NEXT, from, size)
@@ -378,10 +408,21 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
throw new NotFoundException(errorMsg)
}
}.getOrElse {
- Option(sessionManager.getBatchMetadata(batchId)).map { metadata =>
- if (fe.connectionUrl != metadata.kyuubiInstance) {
+ sessionManager.getBatchMetadata(batchId).map { metadata =>
+ if (batchV2Enabled(metadata.requestConf) && metadata.state == "INITIALIZED") {
+ info(s"Batch $batchId is waiting for scheduling")
+ val dummyLogs = List(s"Batch $batchId is waiting for scheduling").asJava
+ new OperationLog(dummyLogs, dummyLogs.size)
+ } else if (fe.connectionUrl != metadata.kyuubiInstance) {
val internalRestClient = getInternalRestClient(metadata.kyuubiInstance)
internalRestClient.getBatchLocalLog(userName, batchId, from, size)
+ } else if (batchV2Enabled(metadata.requestConf) &&
+ // in batch v2 impl, the operation state is changed from PENDING to RUNNING
+ // before being added to SessionManager.
+ (metadata.state == "PENDING" || metadata.state == "RUNNING")) {
+ info(s"Batch $batchId is waiting for submitting")
+ val dummyLogs = List(s"Batch $batchId is waiting for submitting").asJava
+ new OperationLog(dummyLogs, dummyLogs.size)
} else {
throw new NotFoundException(s"No local log found for batch: $batchId")
}
@@ -403,29 +444,50 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
def closeBatchSession(
@PathParam("batchId") batchId: String,
@QueryParam("hive.server2.proxy.user") hs2ProxyUser: String): CloseBatchResponse = {
- val sessionHandle = formatSessionHandle(batchId)
-
- val userName = fe.getSessionUser(hs2ProxyUser)
- Option(sessionManager.getBatchSessionImpl(sessionHandle)).map { batchSession =>
- if (userName != batchSession.user) {
+ def checkPermission(operator: String, owner: String): Unit = {
+ if (operator != owner) {
throw new WebApplicationException(
- s"$userName is not allowed to close the session belong to ${batchSession.user}",
+ s"$operator is not allowed to close the session belong to $owner",
Status.METHOD_NOT_ALLOWED)
}
+ }
+
+ def forceKill(
+ appMgrInfo: ApplicationManagerInfo,
+ batchId: String,
+ user: String): KillResponse = {
+ val (killed, message) = sessionManager.applicationManager
+ .killApplication(appMgrInfo, batchId, Some(user))
+ info(s"Mark batch[$batchId] closed by ${fe.connectionUrl}")
+ sessionManager.updateMetadata(Metadata(identifier = batchId, peerInstanceClosed = true))
+ (killed, message)
+ }
+
+ val sessionHandle = formatSessionHandle(batchId)
+ val userName = fe.getSessionUser(hs2ProxyUser)
+
+ sessionManager.getBatchSession(sessionHandle).map { batchSession =>
+ checkPermission(userName, batchSession.user)
sessionManager.closeSession(batchSession.handle)
- val (success, msg) = batchSession.batchJobSubmissionOp.getKillMessage
- new CloseBatchResponse(success, msg)
+ val (killed, msg) = batchSession.batchJobSubmissionOp.getKillMessage
+ new CloseBatchResponse(killed, msg)
}.getOrElse {
- Option(sessionManager.getBatchMetadata(batchId)).map { metadata =>
- if (userName != metadata.username) {
- throw new WebApplicationException(
- s"$userName is not allowed to close the session belong to ${metadata.username}",
- Status.METHOD_NOT_ALLOWED)
- } else if (OperationState.isTerminal(OperationState.withName(metadata.state)) ||
- metadata.kyuubiInstance == fe.connectionUrl) {
+ sessionManager.getBatchMetadata(batchId).map { metadata =>
+ checkPermission(userName, metadata.username)
+ if (OperationState.isTerminal(OperationState.withName(metadata.state))) {
new CloseBatchResponse(false, s"The batch[$metadata] has been terminated.")
- } else {
+ } else if (batchV2Enabled(metadata.requestConf) && metadata.state == "INITIALIZED") {
+ if (batchService.get.cancelUnscheduledBatch(batchId)) {
+ new CloseBatchResponse(true, s"Unscheduled batch $batchId is canceled.")
+ } else if (OperationState.isTerminal(OperationState.withName(metadata.state))) {
+ new CloseBatchResponse(false, s"The batch[$metadata] has been terminated.")
+ } else {
+ info(s"Cancel batch[$batchId] with state ${metadata.state} by killing application")
+ val (killed, msg) = forceKill(metadata.appMgrInfo, batchId, userName)
+ new CloseBatchResponse(killed, msg)
+ }
+ } else if (metadata.kyuubiInstance != fe.connectionUrl) {
info(s"Redirecting delete batch[$batchId] to ${metadata.kyuubiInstance}")
val internalRestClient = getInternalRestClient(metadata.kyuubiInstance)
try {
@@ -433,20 +495,13 @@ private[v1] class BatchesResource extends ApiRequestContext with Logging {
} catch {
case e: KyuubiRestException =>
error(s"Error redirecting delete batch[$batchId] to ${metadata.kyuubiInstance}", e)
- val appMgrKillResp = sessionManager.applicationManager.killApplication(
- metadata.clusterManager,
- batchId)
- info(
- s"Marking batch[$batchId/${metadata.kyuubiInstance}] closed by ${fe.connectionUrl}")
- sessionManager.updateMetadata(Metadata(
- identifier = batchId,
- peerInstanceClosed = true))
- if (appMgrKillResp._1) {
- new CloseBatchResponse(appMgrKillResp._1, appMgrKillResp._2)
- } else {
- new CloseBatchResponse(false, Utils.stringifyException(e))
- }
+ val (killed, msg) = forceKill(metadata.appMgrInfo, batchId, userName)
+ new CloseBatchResponse(killed, if (killed) msg else Utils.stringifyException(e))
}
+ } else { // should not happen, but handle this for safe
+ warn(s"Something wrong on deleting batch[$batchId], try forcibly killing application")
+ val (killed, msg) = forceKill(metadata.appMgrInfo, batchId, userName)
+ new CloseBatchResponse(killed, msg)
}
}.getOrElse {
error(s"Invalid batchId: $batchId")
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/OperationsResource.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/OperationsResource.scala
index 70a6d3a2848..fdde5bbc5b2 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/OperationsResource.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/OperationsResource.scala
@@ -17,7 +17,7 @@
package org.apache.kyuubi.server.api.v1
-import javax.ws.rs._
+import javax.ws.rs.{BadRequestException, _}
import javax.ws.rs.core.{MediaType, Response}
import scala.collection.JavaConverters._
@@ -32,12 +32,13 @@ import org.apache.kyuubi.{KyuubiSQLException, Logging}
import org.apache.kyuubi.client.api.v1.dto._
import org.apache.kyuubi.events.KyuubiOperationEvent
import org.apache.kyuubi.operation.{FetchOrientation, KyuubiOperation, OperationHandle}
-import org.apache.kyuubi.server.api.ApiRequestContext
+import org.apache.kyuubi.server.api.{ApiRequestContext, ApiUtils}
@Tag(name = "Operation")
@Produces(Array(MediaType.APPLICATION_JSON))
@Consumes(Array(MediaType.APPLICATION_JSON))
private[v1] class OperationsResource extends ApiRequestContext with Logging {
+ import ApiUtils.logAndRefineErrorMsg
@ApiResponse(
responseCode = "200",
@@ -57,8 +58,7 @@ private[v1] class OperationsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting an operation event"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -84,8 +84,7 @@ private[v1] class OperationsResource extends ApiRequestContext with Logging {
case NonFatal(e) =>
val errorMsg =
s"Error applying ${request.getAction} for operation handle $operationHandleStr"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -109,7 +108,7 @@ private[v1] class OperationsResource extends ApiRequestContext with Logging {
var scale = 0
if (tPrimitiveTypeEntry.getTypeQualifiers != null) {
val qualifiers = tPrimitiveTypeEntry.getTypeQualifiers.getQualifiers
- val defaultValue = TTypeQualifierValue.i32Value(0);
+ val defaultValue = TTypeQualifierValue.i32Value(0)
precision = qualifiers.getOrDefault("precision", defaultValue).getI32Value
scale = qualifiers.getOrDefault("scale", defaultValue).getI32Value
}
@@ -124,8 +123,7 @@ private[v1] class OperationsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = s"Error getting result set metadata for operation handle $operationHandleStr"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -140,19 +138,26 @@ private[v1] class OperationsResource extends ApiRequestContext with Logging {
@Path("{operationHandle}/log")
def getOperationLog(
@PathParam("operationHandle") operationHandleStr: String,
- @QueryParam("maxrows") maxRows: Int): OperationLog = {
+ @QueryParam("maxrows") @DefaultValue("100") maxRows: Int,
+ @QueryParam("fetchorientation") @DefaultValue("FETCH_NEXT")
+ fetchOrientation: String): OperationLog = {
try {
- val rowSet = fe.be.sessionManager.operationManager.getOperationLogRowSet(
+ if (fetchOrientation != "FETCH_NEXT" && fetchOrientation != "FETCH_FIRST") {
+ throw new BadRequestException(s"$fetchOrientation in operation log is not supported")
+ }
+ val fetchResultsResp = fe.be.sessionManager.operationManager.getOperationLogRowSet(
OperationHandle(operationHandleStr),
- FetchOrientation.FETCH_NEXT,
+ FetchOrientation.withName(fetchOrientation),
maxRows)
+ val rowSet = fetchResultsResp.getResults
val logRowSet = rowSet.getColumns.get(0).getStringVal.getValues.asScala
new OperationLog(logRowSet.asJava, logRowSet.size)
} catch {
+ case e: BadRequestException =>
+ throw e
case NonFatal(e) =>
val errorMsg = s"Error getting operation log for operation handle $operationHandleStr"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -171,11 +176,12 @@ private[v1] class OperationsResource extends ApiRequestContext with Logging {
@QueryParam("fetchorientation") @DefaultValue("FETCH_NEXT")
fetchOrientation: String): ResultRowSet = {
try {
- val rowSet = fe.be.fetchResults(
+ val fetchResultsResp = fe.be.fetchResults(
OperationHandle(operationHandleStr),
FetchOrientation.withName(fetchOrientation),
maxRows,
fetchLog = false)
+ val rowSet = fetchResultsResp.getResults
val rows = rowSet.getRows.asScala.map(i => {
new Row(i.getColVals.asScala.map(i => {
new Field(
@@ -233,8 +239,7 @@ private[v1] class OperationsResource extends ApiRequestContext with Logging {
throw new BadRequestException(e.getMessage)
case NonFatal(e) =>
val errorMsg = s"Error getting result row set for operation handle $operationHandleStr"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/SessionsResource.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/SessionsResource.scala
index 81d1a27092f..10a55786798 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/SessionsResource.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/SessionsResource.scala
@@ -27,13 +27,14 @@ import scala.util.control.NonFatal
import io.swagger.v3.oas.annotations.media.{ArraySchema, Content, Schema}
import io.swagger.v3.oas.annotations.responses.ApiResponse
import io.swagger.v3.oas.annotations.tags.Tag
+import org.apache.commons.lang3.StringUtils
import org.apache.hive.service.rpc.thrift.{TGetInfoType, TProtocolVersion}
import org.apache.kyuubi.Logging
import org.apache.kyuubi.client.api.v1.dto
import org.apache.kyuubi.client.api.v1.dto._
import org.apache.kyuubi.config.KyuubiReservedKeys._
-import org.apache.kyuubi.operation.OperationHandle
+import org.apache.kyuubi.operation.{KyuubiOperation, OperationHandle}
import org.apache.kyuubi.server.api.{ApiRequestContext, ApiUtils}
import org.apache.kyuubi.session.{KyuubiSession, SessionHandle}
@@ -41,6 +42,8 @@ import org.apache.kyuubi.session.{KyuubiSession, SessionHandle}
@Produces(Array(MediaType.APPLICATION_JSON))
@Consumes(Array(MediaType.APPLICATION_JSON))
private[v1] class SessionsResource extends ApiRequestContext with Logging {
+ import ApiUtils.logAndRefineErrorMsg
+
implicit def toSessionHandle(str: String): SessionHandle = SessionHandle.fromUUID(str)
private def sessionManager = fe.be.sessionManager
@@ -52,9 +55,8 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
description = "get the list of all live sessions")
@GET
def sessions(): Seq[SessionData] = {
- sessionManager.allSessions().map { case session =>
- ApiUtils.sessionData(session.asInstanceOf[KyuubiSession])
- }.toSeq
+ sessionManager.allSessions()
+ .map(session => ApiUtils.sessionData(session.asInstanceOf[KyuubiSession])).toSeq
}
@ApiResponse(
@@ -85,12 +87,12 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
.startTime(event.startTime)
.endTime(event.endTime)
.totalOperations(event.totalOperations)
- .exception(event.exception.getOrElse(null))
+ .exception(event.exception.orNull)
.build).get
} catch {
case NonFatal(e) =>
- error(s"Invalid $sessionHandleStr", e)
- throw new NotFoundException(s"Invalid $sessionHandleStr")
+ val errorMsg = s"Invalid $sessionHandleStr"
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -112,8 +114,8 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
new InfoDetail(info.toString, infoValue.getStringValue)
} catch {
case NonFatal(e) =>
- error(s"Unrecognized GetInfoType value: $infoType", e)
- throw new NotFoundException(s"Unrecognized GetInfoType value: $infoType")
+ val errorMsg = s"Unrecognized GetInfoType value: $infoType"
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -172,6 +174,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
@DELETE
@Path("{sessionHandle}")
def closeSession(@PathParam("sessionHandle") sessionHandleStr: String): Response = {
+ info(s"Received request of closing $sessionHandleStr")
fe.be.closeSession(sessionHandleStr)
Response.ok().build()
}
@@ -197,8 +200,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error executing statement"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -216,8 +218,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting type information"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -235,8 +236,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting catalogs"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -260,8 +260,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting schemas"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -286,8 +285,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting tables"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -305,8 +303,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting table types"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -331,8 +328,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting columns"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -356,8 +352,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting functions"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -381,8 +376,7 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting primary keys"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
@@ -409,8 +403,33 @@ private[v1] class SessionsResource extends ApiRequestContext with Logging {
} catch {
case NonFatal(e) =>
val errorMsg = "Error getting cross reference"
- error(errorMsg, e)
- throw new NotFoundException(errorMsg)
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
+ }
+ }
+
+ @ApiResponse(
+ responseCode = "200",
+ content = Array(new Content(
+ mediaType = MediaType.APPLICATION_JSON,
+ array = new ArraySchema(schema = new Schema(implementation =
+ classOf[OperationData])))),
+ description =
+ "get the list of all type operations belong to session")
+ @GET
+ @Path("{sessionHandle}/operations")
+ def getOperation(@PathParam("sessionHandle") sessionHandleStr: String): Seq[OperationData] = {
+ try {
+ fe.be.sessionManager.operationManager.allOperations().map { operation =>
+ if (StringUtils.equalsIgnoreCase(
+ operation.getSession.handle.identifier.toString,
+ sessionHandleStr)) {
+ ApiUtils.operationData(operation.asInstanceOf[KyuubiOperation])
+ }
+ }.toSeq.asInstanceOf[Seq[OperationData]]
+ } catch {
+ case NonFatal(e) =>
+ val errorMsg = "Error getting the list of all type operations belong to session"
+ throw new NotFoundException(logAndRefineErrorMsg(errorMsg, e))
}
}
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/AuthenticationFilter.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/AuthenticationFilter.scala
index 3c4065a7bdc..523d2490753 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/AuthenticationFilter.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/AuthenticationFilter.scala
@@ -22,7 +22,7 @@ import javax.security.sasl.AuthenticationException
import javax.servlet.{Filter, FilterChain, FilterConfig, ServletException, ServletRequest, ServletResponse}
import javax.servlet.http.{HttpServletRequest, HttpServletResponse}
-import scala.collection.mutable.HashMap
+import scala.collection.mutable
import org.apache.kyuubi.Logging
import org.apache.kyuubi.config.KyuubiConf
@@ -35,7 +35,8 @@ class AuthenticationFilter(conf: KyuubiConf) extends Filter with Logging {
import AuthenticationHandler._
import AuthSchemes._
- private[authentication] val authSchemeHandlers = new HashMap[AuthScheme, AuthenticationHandler]()
+ private[authentication] val authSchemeHandlers =
+ new mutable.HashMap[AuthScheme, AuthenticationHandler]()
private[authentication] def addAuthHandler(authHandler: AuthenticationHandler): Unit = {
authHandler.init(conf)
@@ -57,7 +58,7 @@ class AuthenticationFilter(conf: KyuubiConf) extends Filter with Logging {
val authTypes = conf.get(AUTHENTICATION_METHOD).map(AuthTypes.withName)
val spnegoKerberosEnabled = authTypes.contains(KERBEROS)
val basicAuthTypeOpt = {
- if (authTypes == Seq(NOSASL)) {
+ if (authTypes == Set(NOSASL)) {
authTypes.headOption
} else {
authTypes.filterNot(_.equals(KERBEROS)).filterNot(_.equals(NOSASL)).headOption
@@ -88,7 +89,7 @@ class AuthenticationFilter(conf: KyuubiConf) extends Filter with Logging {
/**
* If the request has a valid authentication token it allows the request to continue to the
* target resource, otherwise it triggers an authentication sequence using the configured
- * {@link AuthenticationHandler}.
+ * [[AuthenticationHandler]].
*
* @param request the request object.
* @param response the response object.
@@ -109,32 +110,31 @@ class AuthenticationFilter(conf: KyuubiConf) extends Filter with Logging {
HTTP_PROXY_HEADER_CLIENT_IP_ADDRESS.set(
httpRequest.getHeader(conf.get(FRONTEND_PROXY_HTTP_CLIENT_IP_HEADER)))
- if (matchedHandler == null) {
- debug(s"No auth scheme matched for url: ${httpRequest.getRequestURL}")
- httpResponse.setStatus(HttpServletResponse.SC_UNAUTHORIZED)
- AuthenticationAuditLogger.audit(httpRequest, httpResponse)
- httpResponse.sendError(
- HttpServletResponse.SC_UNAUTHORIZED,
- s"No auth scheme matched for $authorization")
- } else {
- HTTP_AUTH_TYPE.set(matchedHandler.authScheme.toString)
- try {
+ try {
+ if (matchedHandler == null) {
+ debug(s"No auth scheme matched for url: ${httpRequest.getRequestURL}")
+ httpResponse.setStatus(HttpServletResponse.SC_UNAUTHORIZED)
+ httpResponse.sendError(
+ HttpServletResponse.SC_UNAUTHORIZED,
+ s"No auth scheme matched for $authorization")
+ } else {
+ HTTP_AUTH_TYPE.set(matchedHandler.authScheme.toString)
val authUser = matchedHandler.authenticate(httpRequest, httpResponse)
if (authUser != null) {
HTTP_CLIENT_USER_NAME.set(authUser)
doFilter(filterChain, httpRequest, httpResponse)
}
- AuthenticationAuditLogger.audit(httpRequest, httpResponse)
- } catch {
- case e: AuthenticationException =>
- httpResponse.setStatus(HttpServletResponse.SC_FORBIDDEN)
- AuthenticationAuditLogger.audit(httpRequest, httpResponse)
- HTTP_CLIENT_USER_NAME.remove()
- HTTP_CLIENT_IP_ADDRESS.remove()
- HTTP_PROXY_HEADER_CLIENT_IP_ADDRESS.remove()
- HTTP_AUTH_TYPE.remove()
- httpResponse.sendError(HttpServletResponse.SC_FORBIDDEN, e.getMessage)
}
+ } catch {
+ case e: AuthenticationException =>
+ httpResponse.setStatus(HttpServletResponse.SC_FORBIDDEN)
+ HTTP_CLIENT_USER_NAME.remove()
+ HTTP_CLIENT_IP_ADDRESS.remove()
+ HTTP_PROXY_HEADER_CLIENT_IP_ADDRESS.remove()
+ HTTP_AUTH_TYPE.remove()
+ httpResponse.sendError(HttpServletResponse.SC_FORBIDDEN, e.getMessage)
+ } finally {
+ AuthenticationAuditLogger.audit(httpRequest, httpResponse)
}
}
@@ -158,7 +158,7 @@ class AuthenticationFilter(conf: KyuubiConf) extends Filter with Logging {
}
override def destroy(): Unit = {
- if (!authSchemeHandlers.isEmpty) {
+ if (authSchemeHandlers.nonEmpty) {
authSchemeHandlers.values.foreach(_.destroy())
authSchemeHandlers.clear()
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/AuthenticationHandler.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/AuthenticationHandler.scala
index acbc52f3531..bf2cb5bbecb 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/AuthenticationHandler.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/AuthenticationHandler.scala
@@ -46,14 +46,14 @@ trait AuthenticationHandler {
/**
* Destroys the authentication handler instance.
*
- * This method is invoked by the {@link AuthenticationFilter# destroy} method.
+ * This method is invoked by the [[AuthenticationFilter.destroy]] method.
*/
def destroy(): Unit
/**
* Performs an authentication step for the given HTTP client request.
*
- * This method is invoked by the {@link AuthenticationFilter} only if the HTTP client request is
+ * This method is invoked by the [[AuthenticationFilter]] only if the HTTP client request is
* not yet authenticated.
*
* Depending upon the authentication mechanism being implemented, a particular HTTP client may
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/KerberosAuthenticationHandler.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/KerberosAuthenticationHandler.scala
index 19a31feb6f3..04603f30a41 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/KerberosAuthenticationHandler.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/KerberosAuthenticationHandler.scala
@@ -46,7 +46,7 @@ class KerberosAuthenticationHandler extends AuthenticationHandler with Logging {
override val authScheme: AuthScheme = AuthSchemes.NEGOTIATE
override def authenticationSupported: Boolean = {
- !keytab.isEmpty && !principal.isEmpty
+ keytab.nonEmpty && principal.nonEmpty
}
override def init(conf: KyuubiConf): Unit = {
@@ -141,7 +141,7 @@ class KerberosAuthenticationHandler extends AuthenticationHandler with Logging {
GSSCredential.ACCEPT_ONLY)
gssContext = gssManager.createContext(gssCreds)
val serverToken = gssContext.acceptSecContext(clientToken, 0, clientToken.length)
- if (serverToken != null && serverToken.length > 0) {
+ if (serverToken != null && serverToken.nonEmpty) {
val authenticate = Base64.getEncoder.encodeToString(serverToken)
response.setHeader(WWW_AUTHENTICATE, s"$NEGOTIATE $authenticate")
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/KerberosUtil.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/KerberosUtil.scala
index 8ff079373ed..a5b95678c23 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/KerberosUtil.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/http/authentication/KerberosUtil.scala
@@ -201,7 +201,7 @@ object KerberosUtil {
val names = ticket.get(0xA2, 0x30, 0xA1, 0x30)
val sb = new StringBuilder
while (names.hasNext) {
- if (sb.length > 0) {
+ if (sb.nonEmpty) {
sb.append('/')
}
sb.append(names.next.getAsString)
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataManager.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataManager.scala
index 88a7f4e4ebd..1da9e1f3148 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataManager.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataManager.scala
@@ -124,32 +124,39 @@ class MetadataManager extends AbstractService("MetadataManager") {
}
}
- def getBatch(batchId: String): Batch = {
- Option(getBatchSessionMetadata(batchId)).map(buildBatch).orNull
+ def getBatch(batchId: String): Option[Batch] = {
+ getBatchSessionMetadata(batchId).map(buildBatch)
}
- def getBatchSessionMetadata(batchId: String): Metadata = {
- Option(withMetadataRequestMetrics(_metadataStore.getMetadata(batchId, true))).filter(
- _.sessionType == SessionType.BATCH).orNull
+ def getBatchSessionMetadata(batchId: String): Option[Metadata] = {
+ Option(withMetadataRequestMetrics(_metadataStore.getMetadata(batchId)))
+ .filter(_.sessionType == SessionType.BATCH)
}
- def getBatches(
+ def getBatches(filter: MetadataFilter, from: Int, size: Int): Seq[Batch] = {
+ withMetadataRequestMetrics(_metadataStore.getMetadataList(filter, from, size)).map(
+ buildBatch)
+ }
+
+ def countBatch(
batchType: String,
batchUser: String,
batchState: String,
- createTime: Long,
- endTime: Long,
- from: Int,
- size: Int): Seq[Batch] = {
+ kyuubiInstance: String): Int = {
val filter = MetadataFilter(
sessionType = SessionType.BATCH,
engineType = batchType,
username = batchUser,
state = batchState,
- createTime = createTime,
- endTime = endTime)
- withMetadataRequestMetrics(_metadataStore.getMetadataList(filter, from, size, true)).map(
- buildBatch)
+ kyuubiInstance = kyuubiInstance)
+ withMetadataRequestMetrics(_metadataStore.countMetadata(filter))
+ }
+
+ def pickBatchForSubmitting(kyuubiInstance: String): Option[Metadata] =
+ withMetadataRequestMetrics(_metadataStore.pickMetadata(kyuubiInstance))
+
+ def cancelUnscheduledBatch(batchId: String): Boolean = {
+ _metadataStore.transformMetadataState(batchId, "INITIALIZED", "CANCELED")
}
def getBatchesRecoveryMetadata(
@@ -161,7 +168,7 @@ class MetadataManager extends AbstractService("MetadataManager") {
sessionType = SessionType.BATCH,
state = state,
kyuubiInstance = kyuubiInstance)
- withMetadataRequestMetrics(_metadataStore.getMetadataList(filter, from, size, false))
+ withMetadataRequestMetrics(_metadataStore.getMetadataList(filter, from, size))
}
def getPeerInstanceClosedBatchesMetadata(
@@ -174,7 +181,7 @@ class MetadataManager extends AbstractService("MetadataManager") {
state = state,
kyuubiInstance = kyuubiInstance,
peerInstanceClosed = true)
- withMetadataRequestMetrics(_metadataStore.getMetadataList(filter, from, size, true))
+ withMetadataRequestMetrics(_metadataStore.getMetadataList(filter, from, size))
}
def updateMetadata(metadata: Metadata, asyncRetryOnError: Boolean = true): Unit = {
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataStore.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataStore.scala
index 4416c4a6dce..d8258816a45 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataStore.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataStore.scala
@@ -28,27 +28,45 @@ trait MetadataStore extends Closeable {
*/
def insertMetadata(metadata: Metadata): Unit
+ /**
+ * Find unscheduled batch job metadata and pick up it to submit.
+ * @param kyuubiInstance the Kyuubi instance picked batch job
+ * @return selected metadata for submitting or None if no sufficient items
+ */
+ def pickMetadata(kyuubiInstance: String): Option[Metadata]
+
+ /**
+ * Transfer state of metadata from the existing state to another
+ * @param identifier the identifier.
+ * @param fromState the desired current state
+ * @param targetState the desired target state
+ * @return `true` if the metadata state was same as `fromState`, and successfully
+ * transitioned to `targetState`, otherwise `false` is returned
+ */
+ def transformMetadataState(identifier: String, fromState: String, targetState: String): Boolean
+
/**
* Get the persisted metadata by batch identifier.
* @param identifier the identifier.
- * @param stateOnly only return the state related column values.
* @return selected metadata.
*/
- def getMetadata(identifier: String, stateOnly: Boolean): Metadata
+ def getMetadata(identifier: String): Metadata
/**
* Get the metadata list with filter conditions, offset and size.
* @param filter the metadata filter conditions.
* @param from the metadata offset.
* @param size the size to get.
- * @param stateOnly only return the state related column values.
* @return selected metadata list.
*/
- def getMetadataList(
- filter: MetadataFilter,
- from: Int,
- size: Int,
- stateOnly: Boolean): Seq[Metadata]
+ def getMetadataList(filter: MetadataFilter, from: Int, size: Int): Seq[Metadata]
+
+ /**
+ * Count the metadata list with filter conditions.
+ * @param filter the metadata filter conditions.
+ * @return the count of metadata satisfied the filter condition.
+ */
+ def countMetadata(filter: MetadataFilter): Int
/**
* Update the metadata according to identifier.
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/api/Metadata.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/api/Metadata.scala
index 949e88abdf1..3e3d9482841 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/api/Metadata.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/api/Metadata.scala
@@ -17,6 +17,11 @@
package org.apache.kyuubi.server.metadata.api
+import org.apache.kyuubi.config.KyuubiConf
+import org.apache.kyuubi.engine.{ApplicationManagerInfo, ApplicationState}
+import org.apache.kyuubi.engine.ApplicationState.ApplicationState
+import org.apache.kyuubi.operation.OperationState
+import org.apache.kyuubi.operation.OperationState.OperationState
import org.apache.kyuubi.session.SessionType.SessionType
/**
@@ -73,4 +78,18 @@ case class Metadata(
engineState: String = null,
engineError: Option[String] = None,
endTime: Long = 0L,
- peerInstanceClosed: Boolean = false)
+ peerInstanceClosed: Boolean = false) {
+ def appMgrInfo: ApplicationManagerInfo = {
+ ApplicationManagerInfo(
+ clusterManager,
+ requestConf.get(KyuubiConf.KUBERNETES_CONTEXT.key),
+ requestConf.get(KyuubiConf.KUBERNETES_NAMESPACE.key))
+ }
+
+ def opState: OperationState = {
+ assert(state != null, "invalid state, a normal batch record must have non-null state")
+ OperationState.withName(state)
+ }
+
+ def appState: Option[ApplicationState] = Option(engineState).map(ApplicationState.withName)
+}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/api/MetadataFilter.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/api/MetadataFilter.scala
index 6213f8e6433..d4f7f2b63d1 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/api/MetadataFilter.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/api/MetadataFilter.scala
@@ -27,6 +27,7 @@ case class MetadataFilter(
engineType: String = null,
username: String = null,
state: String = null,
+ requestName: String = null,
kyuubiInstance: String = null,
createTime: Long = 0L,
endTime: Long = 0L,
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/DatabaseType.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/DatabaseType.scala
index ef93f31c55f..67d6686d17e 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/DatabaseType.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/DatabaseType.scala
@@ -20,5 +20,5 @@ package org.apache.kyuubi.server.metadata.jdbc
object DatabaseType extends Enumeration {
type DatabaseType = Value
- val DERBY, MYSQL, CUSTOM = Value
+ val DERBY, MYSQL, CUSTOM, SQLITE = Value
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala
index 488039e2baa..dcb9c0f6685 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala
@@ -47,15 +47,17 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
private val dbType = DatabaseType.withName(conf.get(METADATA_STORE_JDBC_DATABASE_TYPE))
private val driverClassOpt = conf.get(METADATA_STORE_JDBC_DRIVER)
private val driverClass = dbType match {
+ case SQLITE => driverClassOpt.getOrElse("org.sqlite.JDBC")
case DERBY => driverClassOpt.getOrElse("org.apache.derby.jdbc.AutoloadedDriver")
case MYSQL => driverClassOpt.getOrElse("com.mysql.jdbc.Driver")
case CUSTOM => driverClassOpt.getOrElse(
throw new IllegalArgumentException("No jdbc driver defined"))
}
- private val databaseAdaptor = dbType match {
+ private val dialect = dbType match {
case DERBY => new DerbyDatabaseDialect
- case MYSQL => new MysqlDatabaseDialect
+ case SQLITE => new SQLiteDatabaseDialect
+ case MYSQL => new MySQLDatabaseDialect
case CUSTOM => new GenericDatabaseDialect
}
@@ -80,12 +82,14 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
private def initSchema(): Unit = {
getInitSchema(dbType).foreach { schema =>
- val ddlStatements = schema.trim.split(";")
+ val ddlStatements = schema.trim.split(";").map(_.trim)
JdbcUtils.withConnection { connection =>
Utils.tryLogNonFatalError {
ddlStatements.foreach { ddlStatement =>
execute(connection, ddlStatement)
- info(s"Execute init schema ddl: $ddlStatement successfully.")
+ info(s"""Execute init schema ddl successfully.
+ |$ddlStatement
+ |""".stripMargin)
}
}
}
@@ -190,34 +194,87 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
}
}
- override def getMetadata(identifier: String, stateOnly: Boolean): Metadata = {
- val query =
- if (stateOnly) {
- s"SELECT $METADATA_STATE_ONLY_COLUMNS FROM $METADATA_TABLE WHERE identifier = ?"
- } else {
- s"SELECT $METADATA_ALL_COLUMNS FROM $METADATA_TABLE WHERE identifier = ?"
+ override def pickMetadata(kyuubiInstance: String): Option[Metadata] = synchronized {
+ JdbcUtils.executeQueryWithRowMapper(
+ s"""SELECT identifier FROM $METADATA_TABLE
+ |WHERE state=?
+ |ORDER BY create_time ASC LIMIT 1
+ |""".stripMargin) { stmt =>
+ stmt.setString(1, OperationState.INITIALIZED.toString)
+ } { resultSet =>
+ resultSet.getString(1)
+ }.headOption.filter { preSelectedBatchId =>
+ JdbcUtils.executeUpdate(
+ s"""UPDATE $METADATA_TABLE
+ |SET kyuubi_instance=?, state=?
+ |WHERE identifier=? AND state=?
+ |""".stripMargin) { stmt =>
+ stmt.setString(1, kyuubiInstance)
+ stmt.setString(2, OperationState.PENDING.toString)
+ stmt.setString(3, preSelectedBatchId)
+ stmt.setString(4, OperationState.INITIALIZED.toString)
+ } == 1
+ }.map { pickedBatchId =>
+ getMetadata(pickedBatchId)
+ }
+ }
+
+ override def transformMetadataState(
+ identifier: String,
+ fromState: String,
+ targetState: String): Boolean = {
+ val query = s"UPDATE $METADATA_TABLE SET state = ? WHERE identifier = ? AND state = ?"
+ JdbcUtils.withConnection { connection =>
+ withUpdateCount(connection, query, fromState, identifier, targetState) { updateCount =>
+ updateCount == 1
}
+ }
+ }
+
+ override def getMetadata(identifier: String): Metadata = {
+ val query = s"SELECT $METADATA_COLUMNS FROM $METADATA_TABLE WHERE identifier = ?"
JdbcUtils.withConnection { connection =>
withResultSet(connection, query, identifier) { rs =>
- buildMetadata(rs, stateOnly).headOption.orNull
+ buildMetadata(rs).headOption.orNull
}
}
}
- override def getMetadataList(
- filter: MetadataFilter,
- from: Int,
- size: Int,
- stateOnly: Boolean): Seq[Metadata] = {
+ override def getMetadataList(filter: MetadataFilter, from: Int, size: Int): Seq[Metadata] = {
val queryBuilder = new StringBuilder
val params = ListBuffer[Any]()
- if (stateOnly) {
- queryBuilder.append(s"SELECT $METADATA_STATE_ONLY_COLUMNS FROM $METADATA_TABLE")
- } else {
- queryBuilder.append(s"SELECT $METADATA_ALL_COLUMNS FROM $METADATA_TABLE")
+ queryBuilder.append("SELECT ")
+ queryBuilder.append(METADATA_COLUMNS)
+ queryBuilder.append(s" FROM $METADATA_TABLE")
+ queryBuilder.append(s" ${assembleWhereClause(filter, params)}")
+ queryBuilder.append(" ORDER BY key_id ")
+ queryBuilder.append(dialect.limitClause(size, from))
+ val query = queryBuilder.toString
+ JdbcUtils.withConnection { connection =>
+ withResultSet(connection, query, params.toSeq: _*) { rs =>
+ buildMetadata(rs)
+ }
}
- val whereConditions = ListBuffer[String]()
+ }
+
+ override def countMetadata(filter: MetadataFilter): Int = {
+ val queryBuilder = new StringBuilder
+ val params = ListBuffer[Any]()
+ queryBuilder.append(s"SELECT COUNT(1) FROM $METADATA_TABLE")
+ queryBuilder.append(s" ${assembleWhereClause(filter, params)}")
+ val query = queryBuilder.toString
+ JdbcUtils.executeQueryWithRowMapper(query) { stmt =>
+ setStatementParams(stmt, params)
+ } { resultSet =>
+ resultSet.getInt(1)
+ }.head
+ }
+
+ private def assembleWhereClause(
+ filter: MetadataFilter,
+ params: ListBuffer[Any]): String = {
+ val whereConditions = ListBuffer[String]("1 = 1")
Option(filter.sessionType).foreach { sessionType =>
whereConditions += "session_type = ?"
params += sessionType.toString
@@ -234,6 +291,10 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
whereConditions += "state = ?"
params += state.toUpperCase(Locale.ROOT)
}
+ Option(filter.requestName).filter(_.nonEmpty).foreach { requestName =>
+ whereConditions += "request_name = ?"
+ params += requestName
+ }
Option(filter.kyuubiInstance).filter(_.nonEmpty).foreach { kyuubiInstance =>
whereConditions += "kyuubi_instance = ?"
params += kyuubiInstance
@@ -251,16 +312,7 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
whereConditions += "peer_instance_closed = ?"
params += filter.peerInstanceClosed
}
- if (whereConditions.nonEmpty) {
- queryBuilder.append(whereConditions.mkString(" WHERE ", " AND ", ""))
- }
- queryBuilder.append(" ORDER BY key_id")
- val query = databaseAdaptor.addLimitAndOffsetToQuery(queryBuilder.toString(), size, from)
- JdbcUtils.withConnection { connection =>
- withResultSet(connection, query, params: _*) { rs =>
- buildMetadata(rs, stateOnly)
- }
- }
+ whereConditions.mkString("WHERE ", " AND ", "")
}
override def updateMetadata(metadata: Metadata): Unit = {
@@ -269,10 +321,22 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
queryBuilder.append(s"UPDATE $METADATA_TABLE")
val setClauses = ListBuffer[String]()
+ Option(metadata.kyuubiInstance).foreach { _ =>
+ setClauses += "kyuubi_instance = ?"
+ params += metadata.kyuubiInstance
+ }
Option(metadata.state).foreach { _ =>
setClauses += "state = ?"
params += metadata.state
}
+ Option(metadata.requestConf).filter(_.nonEmpty).foreach { _ =>
+ setClauses += "request_conf =?"
+ params += valueAsString(metadata.requestConf)
+ }
+ metadata.clusterManager.foreach { cm =>
+ setClauses += "cluster_manager = ?"
+ params += cm
+ }
if (metadata.endTime > 0) {
setClauses += "end_time = ?"
params += metadata.endTime
@@ -313,10 +377,11 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
val query = queryBuilder.toString()
JdbcUtils.withConnection { connection =>
- withUpdateCount(connection, query, params: _*) { updateCount =>
+ withUpdateCount(connection, query, params.toSeq: _*) { updateCount =>
if (updateCount == 0) {
throw new KyuubiException(
- s"Error updating metadata for ${metadata.identifier} with $query")
+ s"Error updating metadata for ${metadata.identifier} by SQL: $query, " +
+ s"with params: ${params.mkString(", ")}")
}
}
}
@@ -337,7 +402,7 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
}
}
- private def buildMetadata(resultSet: ResultSet, stateOnly: Boolean): Seq[Metadata] = {
+ private def buildMetadata(resultSet: ResultSet): Seq[Metadata] = {
try {
val metadataList = ListBuffer[Metadata]()
while (resultSet.next()) {
@@ -348,7 +413,11 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
val ipAddress = resultSet.getString("ip_address")
val kyuubiInstance = resultSet.getString("kyuubi_instance")
val state = resultSet.getString("state")
+ val resource = resultSet.getString("resource")
+ val className = resultSet.getString("class_name")
val requestName = resultSet.getString("request_name")
+ val requestConf = string2Map(resultSet.getString("request_conf"))
+ val requestArgs = string2Seq(resultSet.getString("request_args"))
val createTime = resultSet.getLong("create_time")
val engineType = resultSet.getString("engine_type")
val clusterManager = Option(resultSet.getString("cluster_manager"))
@@ -360,17 +429,6 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
val endTime = resultSet.getLong("end_time")
val peerInstanceClosed = resultSet.getBoolean("peer_instance_closed")
- var resource: String = null
- var className: String = null
- var requestConf: Map[String, String] = Map.empty
- var requestArgs: Seq[String] = Seq.empty
-
- if (!stateOnly) {
- resource = resultSet.getString("resource")
- className = resultSet.getString("class_name")
- requestConf = string2Map(resultSet.getString("request_conf"))
- requestArgs = string2Seq(resultSet.getString("request_args"))
- }
val metadata = Metadata(
identifier = identifier,
sessionType = sessionType,
@@ -396,7 +454,7 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
peerInstanceClosed = peerInstanceClosed)
metadataList += metadata
}
- metadataList
+ metadataList.toSeq
} finally {
Utils.tryLogNonFatalError(resultSet.close())
}
@@ -509,7 +567,7 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging {
object JDBCMetadataStore {
private val SCHEMA_URL_PATTERN = """^metadata-store-schema-(\d+)\.(\d+)\.(\d+)\.(.*)\.sql$""".r
private val METADATA_TABLE = "metadata"
- private val METADATA_STATE_ONLY_COLUMNS = Seq(
+ private val METADATA_COLUMNS = Seq(
"identifier",
"session_type",
"real_user",
@@ -517,7 +575,11 @@ object JDBCMetadataStore {
"ip_address",
"kyuubi_instance",
"state",
+ "resource",
+ "class_name",
"request_name",
+ "request_conf",
+ "request_args",
"create_time",
"engine_type",
"cluster_manager",
@@ -528,10 +590,4 @@ object JDBCMetadataStore {
"engine_error",
"end_time",
"peer_instance_closed").mkString(",")
- private val METADATA_ALL_COLUMNS = Seq(
- METADATA_STATE_ONLY_COLUMNS,
- "resource",
- "class_name",
- "request_conf",
- "request_args").mkString(",")
}
diff --git a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStoreConf.scala b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStoreConf.scala
index de30b6e6689..dd5d741382f 100644
--- a/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStoreConf.scala
+++ b/kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStoreConf.scala
@@ -17,7 +17,7 @@
package org.apache.kyuubi.server.metadata.jdbc
-import java.util.{Locale, Properties}
+import java.util.Properties
import org.apache.kyuubi.config.{ConfigEntry, KyuubiConf, OptionalConfigEntry}
import org.apache.kyuubi.config.KyuubiConf.buildConf
@@ -37,7 +37,9 @@ object JDBCMetadataStoreConf {
val METADATA_STORE_JDBC_DATABASE_TYPE: ConfigEntry[String] =
buildConf("kyuubi.metadata.store.jdbc.database.type")
.doc("The database type for server jdbc metadata store.
+
+
+
+
diff --git a/kyuubi-util-scala/pom.xml b/kyuubi-util-scala/pom.xml
new file mode 100644
index 00000000000..b97fc110b8b
--- /dev/null
+++ b/kyuubi-util-scala/pom.xml
@@ -0,0 +1,63 @@
+
+
+
+ 4.0.0
+
+ org.apache.kyuubi
+ kyuubi-parent
+ 1.9.0-SNAPSHOT
+
+
+ kyuubi-util-scala_${scala.binary.version}
+ jar
+ Kyuubi Project Util Scala
+ https://kyuubi.apache.org/
+
+
+
+ org.apache.kyuubi
+ kyuubi-util
+ ${project.version}
+
+
+ org.scala-lang
+ scala-library
+
+
+
+
+
+
+ org.apache.maven.plugins
+ maven-jar-plugin
+
+
+ prepare-test-jar
+
+ test-jar
+
+ test-compile
+
+
+
+
+ target/scala-${scala.binary.version}/classes
+ target/scala-${scala.binary.version}/test-classes
+
+
diff --git a/kyuubi-util-scala/src/main/scala/org/apache/kyuubi/util/EnumUtils.scala b/kyuubi-util-scala/src/main/scala/org/apache/kyuubi/util/EnumUtils.scala
new file mode 100644
index 00000000000..0ab3be381c2
--- /dev/null
+++ b/kyuubi-util-scala/src/main/scala/org/apache/kyuubi/util/EnumUtils.scala
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kyuubi.util
+
+import scala.util.Try
+
+object EnumUtils {
+ def isValidEnum(enumeration: Enumeration, enumName: Any): Boolean = Try {
+ enumeration.withName(enumName.toString)
+ }.isSuccess
+
+ def isValidEnums(enumeration: Enumeration, enumNames: Iterable[Any]): Boolean =
+ enumNames.forall(isValidEnum(enumeration, _))
+}
diff --git a/kyuubi-common/src/main/scala/org/apache/kyuubi/engine/SemanticVersion.scala b/kyuubi-util-scala/src/main/scala/org/apache/kyuubi/util/SemanticVersion.scala
similarity index 51%
rename from kyuubi-common/src/main/scala/org/apache/kyuubi/engine/SemanticVersion.scala
rename to kyuubi-util-scala/src/main/scala/org/apache/kyuubi/util/SemanticVersion.scala
index a36896ac928..ba0ae8910fd 100644
--- a/kyuubi-common/src/main/scala/org/apache/kyuubi/engine/SemanticVersion.scala
+++ b/kyuubi-util-scala/src/main/scala/org/apache/kyuubi/util/SemanticVersion.scala
@@ -15,13 +15,15 @@
* limitations under the License.
*/
-package org.apache.kyuubi.engine
+package org.apache.kyuubi.util
/**
- * Encapsulate a component (Kyuubi/Spark/Hive/Flink etc.) version
- * for the convenience of version checks.
+ * Encapsulate a component version for the convenience of version checks.
*/
-case class SemanticVersion(majorVersion: Int, minorVersion: Int) {
+case class SemanticVersion(majorVersion: Int, minorVersion: Int)
+ extends Comparable[SemanticVersion] {
+
+ def ===(targetVersionString: String): Boolean = isVersionEqualTo(targetVersionString)
def <=(targetVersionString: String): Boolean = isVersionAtMost(targetVersionString)
@@ -31,49 +33,46 @@ case class SemanticVersion(majorVersion: Int, minorVersion: Int) {
def <(targetVersionString: String): Boolean = !isVersionAtLeast(targetVersionString)
- def isVersionAtMost(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- (runtimeMajor < targetMajor) || {
- runtimeMajor == targetMajor && runtimeMinor <= targetMinor
- })
- }
+ def isVersionAtMost(targetVersionString: String): Boolean =
+ compareTo(SemanticVersion(targetVersionString)) <= 0
- def isVersionAtLeast(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- (runtimeMajor > targetMajor) || {
- runtimeMajor == targetMajor && runtimeMinor >= targetMinor
- })
- }
+ def isVersionAtLeast(targetVersionString: String): Boolean =
+ compareTo(SemanticVersion(targetVersionString)) >= 0
- def isVersionEqualTo(targetVersionString: String): Boolean = {
- this.compareVersion(
- targetVersionString,
- (targetMajor: Int, targetMinor: Int, runtimeMajor: Int, runtimeMinor: Int) =>
- runtimeMajor == targetMajor && runtimeMinor == targetMinor)
- }
+ def isVersionEqualTo(targetVersionString: String): Boolean =
+ compareTo(SemanticVersion(targetVersionString)) == 0
- def compareVersion(
- targetVersionString: String,
- callback: (Int, Int, Int, Int) => Boolean): Boolean = {
- val targetVersion = SemanticVersion(targetVersionString)
- val targetMajor = targetVersion.majorVersion
- val targetMinor = targetVersion.minorVersion
- callback(targetMajor, targetMinor, this.majorVersion, this.minorVersion)
+ override def compareTo(v: SemanticVersion): Int = {
+ if (majorVersion > v.majorVersion) {
+ 1
+ } else if (majorVersion < v.majorVersion) {
+ -1
+ } else {
+ minorVersion - v.minorVersion
+ }
}
override def toString: String = s"$majorVersion.$minorVersion"
+
+ /**
+ * Returning a double in format of "majorVersion.minorVersion".
+ * Note: Not suitable for version comparison, only for logging.
+ * @return
+ */
+ def toDouble: Double = toString.toDouble
+
}
object SemanticVersion {
+ private val semanticVersionRegex = """^(\d+)(?:\.(\d+))?(\..*)?$""".r
+
def apply(versionString: String): SemanticVersion = {
- """^(\d+)\.(\d+)(\..*)?$""".r.findFirstMatchIn(versionString) match {
+ semanticVersionRegex.findFirstMatchIn(versionString) match {
case Some(m) =>
- SemanticVersion(m.group(1).toInt, m.group(2).toInt)
+ val major = m.group(1).toInt
+ val minor = Option(m.group(2)).getOrElse("0").toInt
+ SemanticVersion(major, minor)
case None =>
throw new IllegalArgumentException(s"Tried to parse '$versionString' as a project" +
s" version string, but it could not find the major and minor version numbers.")
diff --git a/kyuubi-util-scala/src/main/scala/org/apache/kyuubi/util/reflect/ReflectUtils.scala b/kyuubi-util-scala/src/main/scala/org/apache/kyuubi/util/reflect/ReflectUtils.scala
new file mode 100644
index 00000000000..08916b8d150
--- /dev/null
+++ b/kyuubi-util-scala/src/main/scala/org/apache/kyuubi/util/reflect/ReflectUtils.scala
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.util.reflect
+
+import java.util.ServiceLoader
+
+import scala.collection.JavaConverters._
+import scala.language.existentials
+import scala.reflect.ClassTag
+import scala.util.Try
+
+object ReflectUtils {
+
+ /**
+ * Determines whether the provided class is loadable
+ * @param className the class name
+ * @param cl the class loader
+ * @return is the class name loadable with the class loader
+ */
+ def isClassLoadable(
+ className: String,
+ cl: ClassLoader = Thread.currentThread().getContextClassLoader): Boolean =
+ Try {
+ DynClasses.builder().loader(cl).impl(className).buildChecked()
+ }.isSuccess
+
+ /**
+ * get the field value of the given object
+ * @param target the target object
+ * @param fieldName the field name from declared field names
+ * @tparam T the expected return class type
+ * @return
+ */
+ def getField[T](target: AnyRef, fieldName: String): T = {
+ val (clz, obj) = getTargetClass(target)
+ try {
+ val field = DynFields.builder
+ .hiddenImpl(clz, fieldName)
+ .impl(clz, fieldName)
+ .build[T]
+ if (field.isStatic) {
+ field.asStatic.get
+ } else {
+ field.bind(obj).get
+ }
+ } catch {
+ case e: Exception =>
+ val candidates =
+ (clz.getDeclaredFields ++ clz.getFields).map(_.getName).distinct.sorted
+ throw new RuntimeException(
+ s"Field $fieldName not in $clz [${candidates.mkString(",")}]",
+ e)
+ }
+ }
+
+ /**
+ * Invoke a method with the given name and arguments on the given target object.
+ * @param target the target object
+ * @param methodName the method name from declared field names
+ * @param args pairs of class and values for the arguments
+ * @tparam T the expected return class type,
+ * returning type Nothing if it's not provided or inferable
+ * @return
+ */
+ def invokeAs[T](target: AnyRef, methodName: String, args: (Class[_], AnyRef)*): T = {
+ val (clz, obj) = getTargetClass(target)
+ val argClasses = args.map(_._1)
+ try {
+ val method = DynMethods.builder(methodName)
+ .hiddenImpl(clz, argClasses: _*)
+ .impl(clz, argClasses: _*)
+ .buildChecked
+ if (method.isStatic) {
+ method.asStatic.invoke[T](args.map(_._2): _*)
+ } else {
+ method.bind(obj).invoke[T](args.map(_._2): _*)
+ }
+ } catch {
+ case e: Exception =>
+ val candidates =
+ (clz.getDeclaredMethods ++ clz.getMethods)
+ .map(m => s"${m.getName}(${m.getParameterTypes.map(_.getName).mkString(", ")})")
+ .distinct.sorted
+ val argClassesNames = argClasses.map(_.getName)
+ throw new RuntimeException(
+ s"Method $methodName(${argClassesNames.mkString(", ")})" +
+ s" not found in $clz [${candidates.mkString(", ")}]",
+ e)
+ }
+ }
+
+ /**
+ * Creates a iterator for with a new service loader for the given service type and class
+ * loader.
+ *
+ * @param cl The class loader to be used to load provider-configuration files
+ * and provider classes
+ * @param ct class tag of the generics class type
+ * @tparam T the class of the service type
+ * @return
+ */
+ def loadFromServiceLoader[T](cl: ClassLoader = Thread.currentThread().getContextClassLoader)(
+ implicit ct: ClassTag[T]): Iterator[T] =
+ ServiceLoader.load(ct.runtimeClass, cl).iterator().asScala.map(_.asInstanceOf[T])
+
+ private def getTargetClass(target: AnyRef): (Class[_], AnyRef) = target match {
+ case clz: Class[_] => (clz, None)
+ case clzName: String => (DynClasses.builder.impl(clzName).buildChecked, None)
+ case (clz: Class[_], o: AnyRef) => (clz, o)
+ case (clzName: String, o: AnyRef) => (DynClasses.builder.impl(clzName).buildChecked, o)
+ case o => (o.getClass, o)
+ }
+}
diff --git a/kyuubi-common/src/test/java/org/apache/kyuubi/tags/DeltaTest.java b/kyuubi-util-scala/src/test/java/org/apache/kyuubi/tags/DeltaTest.java
similarity index 100%
rename from kyuubi-common/src/test/java/org/apache/kyuubi/tags/DeltaTest.java
rename to kyuubi-util-scala/src/test/java/org/apache/kyuubi/tags/DeltaTest.java
diff --git a/kyuubi-common/src/test/java/org/apache/kyuubi/tags/IcebergTest.java b/kyuubi-util-scala/src/test/java/org/apache/kyuubi/tags/IcebergTest.java
similarity index 100%
rename from kyuubi-common/src/test/java/org/apache/kyuubi/tags/IcebergTest.java
rename to kyuubi-util-scala/src/test/java/org/apache/kyuubi/tags/IcebergTest.java
diff --git a/kyuubi-common/src/test/java/org/apache/kyuubi/tags/PySparkTest.java b/kyuubi-util-scala/src/test/java/org/apache/kyuubi/tags/PySparkTest.java
similarity index 100%
rename from kyuubi-common/src/test/java/org/apache/kyuubi/tags/PySparkTest.java
rename to kyuubi-util-scala/src/test/java/org/apache/kyuubi/tags/PySparkTest.java
diff --git a/kyuubi-common/src/test/java/org/apache/kyuubi/tags/HudiTest.java b/kyuubi-util-scala/src/test/java/org/apache/kyuubi/tags/SparkLocalClusterTest.java
similarity index 96%
rename from kyuubi-common/src/test/java/org/apache/kyuubi/tags/HudiTest.java
rename to kyuubi-util-scala/src/test/java/org/apache/kyuubi/tags/SparkLocalClusterTest.java
index 346f146faf2..dd718f125c3 100644
--- a/kyuubi-common/src/test/java/org/apache/kyuubi/tags/HudiTest.java
+++ b/kyuubi-util-scala/src/test/java/org/apache/kyuubi/tags/SparkLocalClusterTest.java
@@ -26,4 +26,4 @@
@TagAnnotation
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.METHOD, ElementType.TYPE})
-public @interface HudiTest {}
+public @interface SparkLocalClusterTest {}
diff --git a/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/AssertionUtils.scala b/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/AssertionUtils.scala
new file mode 100644
index 00000000000..2e9a0d3fe2d
--- /dev/null
+++ b/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/AssertionUtils.scala
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kyuubi.util
+
+import java.nio.charset.StandardCharsets
+import java.nio.file.Path
+import java.util.Locale
+
+import scala.collection.Traversable
+import scala.io.Source
+import scala.reflect.ClassTag
+
+import org.scalactic.{source, Prettifier}
+import org.scalatest.Assertions._
+
+object AssertionUtils {
+
+ def assertEqualsIgnoreCase(expected: AnyRef)(actual: AnyRef)(
+ implicit pos: source.Position): Unit = {
+ val isEqualsIgnoreCase = (Option(expected), Option(actual)) match {
+ case (Some(expectedStr: String), Some(actualStr: String)) =>
+ expectedStr.equalsIgnoreCase(actualStr)
+ case (Some(Some(expectedStr: String)), Some(Some(actualStr: String))) =>
+ expectedStr.equalsIgnoreCase(actualStr)
+ case (None, None) => true
+ case _ => false
+ }
+ if (!isEqualsIgnoreCase) {
+ fail(s"Expected equaling to '$expected' ignoring case, but got '$actual'")(pos)
+ }
+ }
+
+ def assertStartsWithIgnoreCase(expectedPrefix: String)(actual: String)(
+ implicit pos: source.Position): Unit = {
+ if (!actual.toLowerCase(Locale.ROOT).startsWith(expectedPrefix.toLowerCase(Locale.ROOT))) {
+ fail(s"Expected starting with '$expectedPrefix' ignoring case, but got [$actual]")(pos)
+ }
+ }
+
+ def assertExistsIgnoreCase(expected: String)(actual: Iterable[String])(
+ implicit pos: source.Position): Unit = {
+ if (!actual.exists(_.equalsIgnoreCase(expected))) {
+ fail(s"Expected containing '$expected' ignoring case, but got [$actual]")(pos)
+ }
+ }
+
+ /**
+ * Assert the file content is equal to the expected lines.
+ * If not, throws assertion error with the given regeneration hint.
+ * @param expectedLines expected lines
+ * @param path source file path
+ * @param regenScript regeneration script
+ * @param splitFirstExpectedLine whether to split the first expected line
+ * into multiple lines by EOL
+ */
+ def assertFileContent(
+ path: Path,
+ expectedLines: Traversable[String],
+ regenScript: String,
+ splitFirstExpectedLine: Boolean = false)(implicit
+ prettifier: Prettifier,
+ pos: source.Position): Unit = {
+ val fileSource = Source.fromFile(path.toUri, StandardCharsets.UTF_8.name())
+ try {
+ def expectedLinesIter = if (splitFirstExpectedLine) {
+ Source.fromString(expectedLines.head).getLines()
+ } else {
+ expectedLines.toIterator
+ }
+ val fileLinesIter = fileSource.getLines()
+ val regenerationHint = s"The file ($path) is out of date. " + {
+ if (regenScript != null && regenScript.nonEmpty) {
+ s" Please regenerate it by running `${regenScript.stripMargin}`. "
+ } else ""
+ }
+ var fileLineCount = 0
+ fileLinesIter.zipWithIndex.zip(expectedLinesIter)
+ .foreach { case ((lineInFile, lineIndex), expectedLine) =>
+ val lineNum = lineIndex + 1
+ withClue(s"Line $lineNum is not expected. $regenerationHint") {
+ assertResult(expectedLine)(lineInFile)(prettifier, pos)
+ }
+ fileLineCount = Math.max(lineNum, fileLineCount)
+ }
+ withClue(s"Line number is not expected. $regenerationHint") {
+ assertResult(expectedLinesIter.size)(fileLineCount)(prettifier, pos)
+ }
+ } finally {
+ fileSource.close()
+ }
+ }
+
+ /**
+ * Asserts that the given function throws an exception of the given type
+ * and with the exception message equals to expected string
+ */
+ def interceptEquals[T <: Exception](f: => Any)(expected: String)(implicit
+ classTag: ClassTag[T],
+ pos: source.Position): Unit = {
+ assert(expected != null)
+ val exception = intercept[T](f)(classTag, pos)
+ assertResult(expected)(exception.getMessage)
+ }
+
+ /**
+ * Asserts that the given function throws an exception of the given type
+ * and with the exception message equals to expected string
+ */
+ def interceptContains[T <: Exception](f: => Any)(contained: String)(implicit
+ classTag: ClassTag[T],
+ pos: source.Position): Unit = {
+ assert(contained != null)
+ val exception = intercept[T](f)(classTag, pos)
+ assert(exception.getMessage.contains(contained))
+ }
+}
diff --git a/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/GoldenFileUtils.scala b/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/GoldenFileUtils.scala
new file mode 100644
index 00000000000..e9927f7e23e
--- /dev/null
+++ b/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/GoldenFileUtils.scala
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.util
+
+import java.nio.file.{Files, Path, StandardOpenOption}
+
+import scala.collection.JavaConverters._
+
+import org.apache.kyuubi.util.AssertionUtils._
+
+object GoldenFileUtils {
+ def isRegenerateGoldenFiles: Boolean = sys.env.get("KYUUBI_UPDATE").contains("1")
+
+ /**
+ * Verify the golden file content when KYUUBI_UPDATE env is not equals to 1,
+ * or regenerate the golden file content when KYUUBI_UPDATE env is equals to 1.
+ *
+ * @param path the path of file
+ * @param lines the expected lines for validation or regeneration
+ * @param regenScript the script for regeneration, used for hints when verification failed
+ */
+ def verifyOrRegenerateGoldenFile(
+ path: Path,
+ lines: Iterable[String],
+ regenScript: String): Unit = {
+ if (isRegenerateGoldenFiles) {
+ Files.write(
+ path,
+ lines.asJava,
+ StandardOpenOption.CREATE,
+ StandardOpenOption.TRUNCATE_EXISTING)
+ } else {
+ assertFileContent(path, lines, regenScript)
+ }
+ }
+}
diff --git a/kyuubi-common/src/test/scala/org/apache/kyuubi/engine/SemanticVersionSuite.scala b/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/SemanticVersionSuite.scala
similarity index 64%
rename from kyuubi-common/src/test/scala/org/apache/kyuubi/engine/SemanticVersionSuite.scala
rename to kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/SemanticVersionSuite.scala
index 28427bdc8b2..6cad67993f3 100644
--- a/kyuubi-common/src/test/scala/org/apache/kyuubi/engine/SemanticVersionSuite.scala
+++ b/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/SemanticVersionSuite.scala
@@ -15,12 +15,17 @@
* limitations under the License.
*/
-package org.apache.kyuubi.engine
+package org.apache.kyuubi.util
+// scalastyle:off
-import org.apache.kyuubi.KyuubiFunSuite
+import org.scalatest.funsuite.AnyFunSuite
-class SemanticVersionSuite extends KyuubiFunSuite {
+import org.apache.kyuubi.util.AssertionUtils._
+// scalastyle:on
+// scalastyle:off
+class SemanticVersionSuite extends AnyFunSuite {
+// scalastyle:on
test("parse normal version") {
val version = SemanticVersion("1.12.4")
assert(version.majorVersion === 1)
@@ -39,6 +44,15 @@ class SemanticVersionSuite extends KyuubiFunSuite {
assert(version.minorVersion === 9)
}
+ test("reject parsing illegal formatted version") {
+ interceptContains[IllegalArgumentException](SemanticVersion("v1.0"))(
+ "Tried to parse 'v1.0' as a project version string, " +
+ "but it could not find the major and minor version numbers")
+ interceptContains[IllegalArgumentException](SemanticVersion(".1.0"))(
+ "Tried to parse '.1.0' as a project version string, " +
+ "but it could not find the major and minor version numbers")
+ }
+
test("companion class compare version at most") {
assert(SemanticVersion("1.12").isVersionAtMost("2.8.8-SNAPSHOT"))
val runtimeVersion = SemanticVersion("1.12.4")
@@ -71,4 +85,28 @@ class SemanticVersionSuite extends KyuubiFunSuite {
assert(!runtimeVersion.isVersionEqualTo("1.10.4"))
assert(!runtimeVersion.isVersionEqualTo("2.12.8"))
}
+
+ test("compare version to major version only") {
+ val versionFromMajorOnly = SemanticVersion("3")
+ assert(versionFromMajorOnly === "3.0")
+ assert(versionFromMajorOnly < "3.1")
+ assert(!(versionFromMajorOnly > "3.0"))
+
+ val runtimeVersion = SemanticVersion("2.3.4")
+ assert(runtimeVersion > "1")
+ assert(runtimeVersion > "2")
+ assert(runtimeVersion >= "2")
+ assert(!(runtimeVersion === "2"))
+ assert(runtimeVersion < "3")
+ assert(runtimeVersion <= "4")
+ }
+
+ test("semantic version to double") {
+ assertResult(1.0d)(SemanticVersion("1").toDouble)
+ assertResult(1.2d)(SemanticVersion("1.2").toDouble)
+ assertResult(1.2d)(SemanticVersion("1.2.3").toDouble)
+ assertResult(1.2d)(SemanticVersion("1.2.3-SNAPSHOT").toDouble)
+ assertResult(1.234d)(SemanticVersion("1.234").toDouble)
+ assertResult(1.234d)(SemanticVersion("1.234.567").toDouble)
+ }
}
diff --git a/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/reflect/ReflectUtilsSuite.scala b/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/reflect/ReflectUtilsSuite.scala
new file mode 100644
index 00000000000..626aeebe226
--- /dev/null
+++ b/kyuubi-util-scala/src/test/scala/org/apache/kyuubi/util/reflect/ReflectUtilsSuite.scala
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kyuubi.util.reflect
+
+// scalastyle:off
+import org.scalatest.funsuite.AnyFunSuite
+
+import org.apache.kyuubi.util.AssertionUtils._
+import org.apache.kyuubi.util.reflect.ReflectUtils._
+class ReflectUtilsSuite extends AnyFunSuite {
+ // scalastyle:on
+
+ private val obj1 = new ClassA
+ private val obj2 = new ClassB
+
+ test("check class loadable") {
+ assert(isClassLoadable(getClass.getName))
+ assert(!isClassLoadable("org.apache.kyuubi.NonExistClass"))
+ }
+
+ test("check invokeAs on base class") {
+ assertResult("method1")(invokeAs[String](obj1, "method1"))
+ assertResult("method2")(invokeAs[String](obj1, "method2"))
+ }
+ test("check invokeAs on sub class") {
+ assertResult("method1")(invokeAs[String](obj2, "method1"))
+ assertResult("method2")(invokeAs[String]((classOf[ClassA], obj2), "method2"))
+ assertResult("method3")(invokeAs[String](obj2, "method3"))
+ assertResult("method4")(invokeAs[String](obj2, "method4"))
+ }
+
+ test("check invokeAs on object and static class") {
+ assertResult("method5")(invokeAs[String](ObjectA, "method5"))
+ assertResult("method6")(invokeAs[String](ObjectA, "method6"))
+ assertResult("method5")(invokeAs[String]("org.apache.kyuubi.util.reflect.ObjectA", "method5"))
+ }
+
+ test("check getField on base class") {
+ assertResult("field0")(getField[String](obj1, "field0"))
+ assertResult("field1")(getField[String](obj1, "field1"))
+ assertResult("field2")(getField[String](obj1, "field2"))
+ }
+
+ test("check getField on subclass") {
+ assertResult("field0")(getField[String]((classOf[ClassA], obj2), "field0"))
+ assertResult("field1")(getField[String]((classOf[ClassA], obj2), "field1"))
+ assertResult("field2")(getField[String]((classOf[ClassA], obj2), "field2"))
+ assertResult("field3")(getField[String](obj2, "field3"))
+ assertResult("field4")(getField[String](obj2, "field4"))
+ }
+
+ test("check getField on object and static class") {
+ assertResult("field5")(getField[String](ObjectA, "field5"))
+ assertResult("field6")(getField[String](ObjectA, "field6"))
+ }
+
+ test("test invokeAs method not found exception") {
+ interceptEquals[RuntimeException] {
+ invokeAs[String](
+ ObjectA,
+ "methodNotExists",
+ (classOf[String], "arg1"),
+ (classOf[String], "arg2"))
+ }("Method methodNotExists(java.lang.String, java.lang.String) not found " +
+ "in class org.apache.kyuubi.util.reflect.ObjectA$ " +
+ "[equals(java.lang.Object), field5(), field6(), getClass(), hashCode(), method5(), " +
+ "method6(), notify(), notifyAll(), toString(), wait(), wait(long), wait(long, int)]")
+ }
+}
+
+class ClassA(val field0: String = "field0") {
+ val field1 = "field1"
+ private val field2 = "field2"
+
+ def method1(): String = "method1"
+ private def method2(): String = "method2"
+}
+
+class ClassB extends ClassA {
+ val field3 = "field3"
+ private val field4 = "field4"
+
+ def method3(): String = "method3"
+ private def method4(): String = "method4"
+}
+
+object ObjectA {
+ val field5 = "field5"
+ private val field6 = "field6"
+
+ def method5(): String = "method5"
+ private def method6(): String = "method6"
+}
diff --git a/kyuubi-util/pom.xml b/kyuubi-util/pom.xml
new file mode 100644
index 00000000000..19a4f854345
--- /dev/null
+++ b/kyuubi-util/pom.xml
@@ -0,0 +1,78 @@
+
+
+
+ 4.0.0
+
+ org.apache.kyuubi
+ kyuubi-parent
+ 1.9.0-SNAPSHOT
+
+
+ kyuubi-util
+ jar
+ Kyuubi Project Util
+ https://kyuubi.apache.org/
+
+
+
+
+
+ org.slf4j
+ slf4j-api
+
+
+
+
+
+
+ org.apache.maven.plugins
+ maven-surefire-plugin
+
+ ${skipTests}
+
+
+
+ net.alchim31.maven
+ scala-maven-plugin
+
+
+ attach-scaladocs
+ none
+
+
+
+
+ org.scalatest
+ scalatest-maven-plugin
+
+ true
+
+
+
+ org.scalastyle
+ scalastyle-maven-plugin
+
+ true
+
+
+
+ target/scala-${scala.binary.version}/classes
+ target/scala-${scala.binary.version}/test-classes
+
+
diff --git a/kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynClasses.java b/kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynClasses.java
similarity index 98%
rename from kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynClasses.java
rename to kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynClasses.java
index 05661e6a6e5..78bdd54af6d 100644
--- a/kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynClasses.java
+++ b/kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynClasses.java
@@ -17,7 +17,7 @@
* under the License.
*/
-package org.apache.kyuubi.reflection;
+package org.apache.kyuubi.util.reflect;
import java.util.LinkedHashSet;
import java.util.Set;
diff --git a/kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynConstructors.java b/kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynConstructors.java
similarity index 99%
rename from kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynConstructors.java
rename to kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynConstructors.java
index 59c79b88502..daf55363a35 100644
--- a/kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynConstructors.java
+++ b/kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynConstructors.java
@@ -17,7 +17,7 @@
* under the License.
*/
-package org.apache.kyuubi.reflection;
+package org.apache.kyuubi.util.reflect;
import java.lang.reflect.Constructor;
import java.lang.reflect.InvocationTargetException;
diff --git a/kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynFields.java b/kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynFields.java
similarity index 97%
rename from kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynFields.java
rename to kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynFields.java
index 9430d54e9bb..4c7af9036c7 100644
--- a/kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynFields.java
+++ b/kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynFields.java
@@ -17,7 +17,7 @@
* under the License.
*/
-package org.apache.kyuubi.reflection;
+package org.apache.kyuubi.util.reflect;
import java.lang.reflect.Field;
import java.lang.reflect.Modifier;
@@ -25,7 +25,6 @@
import java.security.PrivilegedAction;
import java.util.HashSet;
import java.util.Set;
-import org.apache.commons.lang3.builder.ToStringBuilder;
/** Copied from iceberg-common */
public class DynFields {
@@ -66,11 +65,8 @@ public void set(Object target, T value) {
@Override
public String toString() {
- return new ToStringBuilder(this)
- .append("class", field.getDeclaringClass().toString())
- .append("name", name)
- .append("type", field.getType())
- .toString();
+ return String.format(
+ "DynFields{class=%s,name=%s,type=%s}", field.getDeclaringClass(), name, field.getType());
}
/**
diff --git a/kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynMethods.java b/kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynMethods.java
similarity index 99%
rename from kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynMethods.java
rename to kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynMethods.java
index 85264191ce7..ca059df0bc9 100644
--- a/kyuubi-common/src/main/java/org/apache/kyuubi/reflection/DynMethods.java
+++ b/kyuubi-util/src/main/java/org/apache/kyuubi/util/reflect/DynMethods.java
@@ -17,7 +17,7 @@
* under the License.
*/
-package org.apache.kyuubi.reflection;
+package org.apache.kyuubi.util.reflect;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
diff --git a/kyuubi-zookeeper/pom.xml b/kyuubi-zookeeper/pom.xml
index cd3e017cd8f..c4309fbab9b 100644
--- a/kyuubi-zookeeper/pom.xml
+++ b/kyuubi-zookeeper/pom.xml
@@ -21,10 +21,10 @@
org.apache.kyuubikyuubi-parent
- 1.8.0-SNAPSHOT
+ 1.9.0-SNAPSHOT
- kyuubi-zookeeper_2.12
+ kyuubi-zookeeper_${scala.binary.version}jarKyuubi Project Embedded Zookeeperhttps://kyuubi.apache.org/
@@ -37,8 +37,8 @@