-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44719][SQL] Fix NoClassDefFoundError when using Hive UDF #42446
Conversation
@@ -1838,7 +1838,7 @@ | |||
<groupId>org.codehaus.jackson</groupId> | |||
<artifactId>jackson-mapper-asl</artifactId> | |||
<version>${codehaus.jackson.version}</version> | |||
<scope>test</scope> | |||
<scope>${hive.deps.scope}</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The master branch uses Hadoop 3.3.6. So Hadoop doesn't need this.
The branch-3.5 uses Hadoop 3.3.4. so Hadoop still need this.
thanks, merged to master |
### What changes were proposed in this pull request? This PR changes jackson-mapper-asl's scope from `test` to `${hive.deps.scope}`. ### Why are the changes needed? Fix `NoClassDefFoundError` when using Hive UDF: ``` spark-sql (default)> add jar /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar; Time taken: 0.413 seconds spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 'net.petrabarus.hiveudfs.LongToIP'; Time taken: 0.038 seconds spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10); 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT long_to_ip(2130706433L) FROM range(10)] java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory at org.apache.hadoop.hive.ql.udf.UDFJson.<clinit>(UDFJson.java:64) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test. Closes apache#42446 from wangyum/SPARK-44719. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Kent Yao <yao@apache.org>
### What changes were proposed in this pull request? This PR changes jackson-mapper-asl's scope from `test` to `${hive.deps.scope}`. ### Why are the changes needed? Fix `NoClassDefFoundError` when using Hive UDF: ``` spark-sql (default)> add jar /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar; Time taken: 0.413 seconds spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 'net.petrabarus.hiveudfs.LongToIP'; Time taken: 0.038 seconds spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10); 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT long_to_ip(2130706433L) FROM range(10)] java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory at org.apache.hadoop.hive.ql.udf.UDFJson.<clinit>(UDFJson.java:64) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test. Closes apache#42446 from wangyum/SPARK-44719. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Kent Yao <yao@apache.org>
In addition to
|
### What changes were proposed in this pull request? This PR aims to provide a new profile, `hive-jackson-provided`, for Apache Spark 4.0.0. ### Why are the changes needed? Since Apache Hadoop 3.3.5, only Apache Hive requires old CodeHaus Jackson dependencies. Apache Spark 3.5.0 tried to eliminate them completely but it's reverted due to Hive UDF support. - #40893 - #42446 To allow Apache Spark 4.0 users - To provide their own CodeHaus Jackson libraries - To exclude them completely if they don't use `Hive UDF`. ### Does this PR introduce _any_ user-facing change? No, this is a new profile. ### How was this patch tested? Pass the CIs and manual build. **Without `hive-jackson-provided`** ``` $ dev/make-distribution.sh -Phive,hive-thriftserver $ ls -al dist/jars/*asl* -rw-r--r-- 1 dongjoon staff 232248 Feb 21 10:53 dist.org/jars/jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Feb 21 10:53 dist.org/jars/jackson-mapper-asl-1.9.13.jar ``` **With `hive-jackson-provided`** ``` $ dev/make-distribution.sh -Phive,hive-thriftserver,hive-jackson-provided $ ls -al dist/jars/*asl* zsh: no matches found: dist/jars/*asl* $ ls -al dist/jars/*hive* -rw-r--r-- 1 dongjoon staff 183633 Feb 21 11:00 dist/jars/hive-beeline-2.3.9.jar -rw-r--r-- 1 dongjoon staff 44704 Feb 21 11:00 dist/jars/hive-cli-2.3.9.jar -rw-r--r-- 1 dongjoon staff 436169 Feb 21 11:00 dist/jars/hive-common-2.3.9.jar -rw-r--r-- 1 dongjoon staff 10840949 Feb 21 11:00 dist/jars/hive-exec-2.3.9-core.jar -rw-r--r-- 1 dongjoon staff 116364 Feb 21 11:00 dist/jars/hive-jdbc-2.3.9.jar -rw-r--r-- 1 dongjoon staff 326585 Feb 21 11:00 dist/jars/hive-llap-common-2.3.9.jar -rw-r--r-- 1 dongjoon staff 8195966 Feb 21 11:00 dist/jars/hive-metastore-2.3.9.jar -rw-r--r-- 1 dongjoon staff 916630 Feb 21 11:00 dist/jars/hive-serde-2.3.9.jar -rw-r--r-- 1 dongjoon staff 1679366 Feb 21 11:00 dist/jars/hive-service-rpc-3.1.3.jar -rw-r--r-- 1 dongjoon staff 53902 Feb 21 11:00 dist/jars/hive-shims-0.23-2.3.9.jar -rw-r--r-- 1 dongjoon staff 8786 Feb 21 11:00 dist/jars/hive-shims-2.3.9.jar -rw-r--r-- 1 dongjoon staff 120293 Feb 21 11:00 dist/jars/hive-shims-common-2.3.9.jar -rw-r--r-- 1 dongjoon staff 12923 Feb 21 11:00 dist/jars/hive-shims-scheduler-2.3.9.jar -rw-r--r-- 1 dongjoon staff 258346 Feb 21 11:00 dist/jars/hive-storage-api-2.8.1.jar -rw-r--r-- 1 dongjoon staff 581739 Feb 21 11:00 dist/jars/spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar -rw-r--r-- 1 dongjoon staff 687446 Feb 21 11:00 dist/jars/spark-hive_2.13-4.0.0-SNAPSHOT.jar ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45201 from dongjoon-hyun/SPARK-47119. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
… a new optional directory ### What changes were proposed in this pull request? This PR aims to provide `Apache Hive`'s `CodeHaus Jackson` dependencies via a new optional directory, `hive-jackson`, instead of the standard `jars` directory of Apache Spark binary distribution. Additionally, two internal configurations are added whose default values are `hive-jackson/*`. - `spark.driver.defaultExtraClassPath` - `spark.executor.defaultExtraClassPath` For example, Apache Spark distributions have been providing `spark-*-yarn-shuffle.jar` file under `yarn` directory instead of `jars`. **YARN SHUFFLE EXAMPLE** ``` $ ls -al yarn/*jar -rw-r--r-- 1 dongjoon staff 77352048 Sep 8 19:08 yarn/spark-3.5.0-yarn-shuffle.jar ``` This PR changes `Apache Hive`'s `CodeHaus Jackson` dependencies in a similar way. **BEFORE** ``` $ ls -al jars/*asl* -rw-r--r-- 1 dongjoon staff 232248 Sep 8 19:08 jars/jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Sep 8 19:08 jars/jackson-mapper-asl-1.9.13.jar ``` **AFTER** ``` $ ls -al jars/*asl* zsh: no matches found: jars/*asl* $ ls -al hive-jackson total 1984 drwxr-xr-x 4 dongjoon staff 128 Feb 23 15:37 . drwxr-xr-x 16 dongjoon staff 512 Feb 23 16:34 .. -rw-r--r-- 1 dongjoon staff 232248 Feb 23 15:37 jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Feb 23 15:37 jackson-mapper-asl-1.9.13.jar ``` ### Why are the changes needed? Since Apache Hadoop 3.3.5, only Apache Hive requires old CodeHaus Jackson dependencies. Apache Spark 3.5.0 tried to eliminate them completely but it's reverted due to Hive UDF support. - #40893 - #42446 SPARK-47119 added a way to exclude Apache Hive Jackson dependencies at the distribution building stage for Apache Spark 4.0.0. - #45201 This PR provides a way to exclude Apache Hive Jackson dependencies at runtime for Apache Spark 4.0.0. - Spark Shell without Apache Hive Jackson dependencies. ``` $ bin/spark-shell --driver-default-class-path "" ``` - Spark SQL Shell without Apache Hive Jackson dependencies. ``` $ bin/spark-sql --driver-default-class-path "" ``` - Spark Thrift Server without Apache Hive Jackson dependencies. ``` $ sbin/start-thriftserver.sh --driver-default-class-path "" ``` In addition, last but not least, this PR eliminates `CodeHaus Jackson` dependencies from the following Apache Spark deamons (using `spark-daemon.sh start`) because they don't require Hive `CodeHaus Jackson` dependencies - Spark Master - Spark Worker - Spark History Server ``` $ grep 'spark-daemon.sh start' * start-history-server.sh:exec "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 "$" start-master.sh:"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \ start-worker.sh: "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS $WORKER_NUM \ ``` ### Does this PR introduce _any_ user-facing change? No. There is no user-facing change by default. - For the distributions with `hive-jackson-provided` profile, the `scope` of Apache Hive Jackson dependencies is `provided` and `hive-jackson` directory is not created at all. - For the distributions with default setting, the `scope` of Apache Hive Jackson dependencies is still `compile`. In addition, they are in the Apache Spark's built-in class path like the following. ![Screenshot 2024-02-23 at 16 48 08](https://github.com/apache/spark/assets/9700541/99ed0f02-2792-4666-ae19-ce4f4b7b8ff9) - The following Spark Deamon don't use `CodeHaus Jackson` dependencies. - Spark Master - Spark Worker - Spark History Server ### How was this patch tested? Pass the CIs and manually build a distribution and check the class paths in the `Environment` Tab. ``` $ dev/make-distribution.sh -Phive,hive-thriftserver ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45237 from dongjoon-hyun/SPARK-47152. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
… a new optional directory ### What changes were proposed in this pull request? This PR aims to provide `Apache Hive`'s `CodeHaus Jackson` dependencies via a new optional directory, `hive-jackson`, instead of the standard `jars` directory of Apache Spark binary distribution. Additionally, two internal configurations are added whose default values are `hive-jackson/*`. - `spark.driver.defaultExtraClassPath` - `spark.executor.defaultExtraClassPath` For example, Apache Spark distributions have been providing `spark-*-yarn-shuffle.jar` file under `yarn` directory instead of `jars`. **YARN SHUFFLE EXAMPLE** ``` $ ls -al yarn/*jar -rw-r--r-- 1 dongjoon staff 77352048 Sep 8 19:08 yarn/spark-3.5.0-yarn-shuffle.jar ``` This PR changes `Apache Hive`'s `CodeHaus Jackson` dependencies in a similar way. **BEFORE** ``` $ ls -al jars/*asl* -rw-r--r-- 1 dongjoon staff 232248 Sep 8 19:08 jars/jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Sep 8 19:08 jars/jackson-mapper-asl-1.9.13.jar ``` **AFTER** ``` $ ls -al jars/*asl* zsh: no matches found: jars/*asl* $ ls -al hive-jackson total 1984 drwxr-xr-x 4 dongjoon staff 128 Feb 23 15:37 . drwxr-xr-x 16 dongjoon staff 512 Feb 23 16:34 .. -rw-r--r-- 1 dongjoon staff 232248 Feb 23 15:37 jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Feb 23 15:37 jackson-mapper-asl-1.9.13.jar ``` ### Why are the changes needed? Since Apache Hadoop 3.3.5, only Apache Hive requires old CodeHaus Jackson dependencies. Apache Spark 3.5.0 tried to eliminate them completely but it's reverted due to Hive UDF support. - apache#40893 - apache#42446 SPARK-47119 added a way to exclude Apache Hive Jackson dependencies at the distribution building stage for Apache Spark 4.0.0. - apache#45201 This PR provides a way to exclude Apache Hive Jackson dependencies at runtime for Apache Spark 4.0.0. - Spark Shell without Apache Hive Jackson dependencies. ``` $ bin/spark-shell --driver-default-class-path "" ``` - Spark SQL Shell without Apache Hive Jackson dependencies. ``` $ bin/spark-sql --driver-default-class-path "" ``` - Spark Thrift Server without Apache Hive Jackson dependencies. ``` $ sbin/start-thriftserver.sh --driver-default-class-path "" ``` In addition, last but not least, this PR eliminates `CodeHaus Jackson` dependencies from the following Apache Spark deamons (using `spark-daemon.sh start`) because they don't require Hive `CodeHaus Jackson` dependencies - Spark Master - Spark Worker - Spark History Server ``` $ grep 'spark-daemon.sh start' * start-history-server.sh:exec "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 "$" start-master.sh:"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \ start-worker.sh: "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS $WORKER_NUM \ ``` ### Does this PR introduce _any_ user-facing change? No. There is no user-facing change by default. - For the distributions with `hive-jackson-provided` profile, the `scope` of Apache Hive Jackson dependencies is `provided` and `hive-jackson` directory is not created at all. - For the distributions with default setting, the `scope` of Apache Hive Jackson dependencies is still `compile`. In addition, they are in the Apache Spark's built-in class path like the following. ![Screenshot 2024-02-23 at 16 48 08](https://github.com/apache/spark/assets/9700541/99ed0f02-2792-4666-ae19-ce4f4b7b8ff9) - The following Spark Deamon don't use `CodeHaus Jackson` dependencies. - Spark Master - Spark Worker - Spark History Server ### How was this patch tested? Pass the CIs and manually build a distribution and check the class paths in the `Environment` Tab. ``` $ dev/make-distribution.sh -Phive,hive-thriftserver ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45237 from dongjoon-hyun/SPARK-47152. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? This PR changes jackson-mapper-asl's scope from `test` to `${hive.deps.scope}`. ### Why are the changes needed? Fix `NoClassDefFoundError` when using Hive UDF: ``` spark-sql (default)> add jar /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar; Time taken: 0.413 seconds spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 'net.petrabarus.hiveudfs.LongToIP'; Time taken: 0.038 seconds spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10); 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT long_to_ip(2130706433L) FROM range(10)] java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory at org.apache.hadoop.hive.ql.udf.UDFJson.<clinit>(UDFJson.java:64) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test. Closes apache#42446 from wangyum/SPARK-44719. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Kent Yao <yao@apache.org>
### What changes were proposed in this pull request? This PR aims to provide a new profile, `hive-jackson-provided`, for Apache Spark 4.0.0. ### Why are the changes needed? Since Apache Hadoop 3.3.5, only Apache Hive requires old CodeHaus Jackson dependencies. Apache Spark 3.5.0 tried to eliminate them completely but it's reverted due to Hive UDF support. - apache#40893 - apache#42446 To allow Apache Spark 4.0 users - To provide their own CodeHaus Jackson libraries - To exclude them completely if they don't use `Hive UDF`. ### Does this PR introduce _any_ user-facing change? No, this is a new profile. ### How was this patch tested? Pass the CIs and manual build. **Without `hive-jackson-provided`** ``` $ dev/make-distribution.sh -Phive,hive-thriftserver $ ls -al dist/jars/*asl* -rw-r--r-- 1 dongjoon staff 232248 Feb 21 10:53 dist.org/jars/jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Feb 21 10:53 dist.org/jars/jackson-mapper-asl-1.9.13.jar ``` **With `hive-jackson-provided`** ``` $ dev/make-distribution.sh -Phive,hive-thriftserver,hive-jackson-provided $ ls -al dist/jars/*asl* zsh: no matches found: dist/jars/*asl* $ ls -al dist/jars/*hive* -rw-r--r-- 1 dongjoon staff 183633 Feb 21 11:00 dist/jars/hive-beeline-2.3.9.jar -rw-r--r-- 1 dongjoon staff 44704 Feb 21 11:00 dist/jars/hive-cli-2.3.9.jar -rw-r--r-- 1 dongjoon staff 436169 Feb 21 11:00 dist/jars/hive-common-2.3.9.jar -rw-r--r-- 1 dongjoon staff 10840949 Feb 21 11:00 dist/jars/hive-exec-2.3.9-core.jar -rw-r--r-- 1 dongjoon staff 116364 Feb 21 11:00 dist/jars/hive-jdbc-2.3.9.jar -rw-r--r-- 1 dongjoon staff 326585 Feb 21 11:00 dist/jars/hive-llap-common-2.3.9.jar -rw-r--r-- 1 dongjoon staff 8195966 Feb 21 11:00 dist/jars/hive-metastore-2.3.9.jar -rw-r--r-- 1 dongjoon staff 916630 Feb 21 11:00 dist/jars/hive-serde-2.3.9.jar -rw-r--r-- 1 dongjoon staff 1679366 Feb 21 11:00 dist/jars/hive-service-rpc-3.1.3.jar -rw-r--r-- 1 dongjoon staff 53902 Feb 21 11:00 dist/jars/hive-shims-0.23-2.3.9.jar -rw-r--r-- 1 dongjoon staff 8786 Feb 21 11:00 dist/jars/hive-shims-2.3.9.jar -rw-r--r-- 1 dongjoon staff 120293 Feb 21 11:00 dist/jars/hive-shims-common-2.3.9.jar -rw-r--r-- 1 dongjoon staff 12923 Feb 21 11:00 dist/jars/hive-shims-scheduler-2.3.9.jar -rw-r--r-- 1 dongjoon staff 258346 Feb 21 11:00 dist/jars/hive-storage-api-2.8.1.jar -rw-r--r-- 1 dongjoon staff 581739 Feb 21 11:00 dist/jars/spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar -rw-r--r-- 1 dongjoon staff 687446 Feb 21 11:00 dist/jars/spark-hive_2.13-4.0.0-SNAPSHOT.jar ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45201 from dongjoon-hyun/SPARK-47119. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
… a new optional directory ### What changes were proposed in this pull request? This PR aims to provide `Apache Hive`'s `CodeHaus Jackson` dependencies via a new optional directory, `hive-jackson`, instead of the standard `jars` directory of Apache Spark binary distribution. Additionally, two internal configurations are added whose default values are `hive-jackson/*`. - `spark.driver.defaultExtraClassPath` - `spark.executor.defaultExtraClassPath` For example, Apache Spark distributions have been providing `spark-*-yarn-shuffle.jar` file under `yarn` directory instead of `jars`. **YARN SHUFFLE EXAMPLE** ``` $ ls -al yarn/*jar -rw-r--r-- 1 dongjoon staff 77352048 Sep 8 19:08 yarn/spark-3.5.0-yarn-shuffle.jar ``` This PR changes `Apache Hive`'s `CodeHaus Jackson` dependencies in a similar way. **BEFORE** ``` $ ls -al jars/*asl* -rw-r--r-- 1 dongjoon staff 232248 Sep 8 19:08 jars/jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Sep 8 19:08 jars/jackson-mapper-asl-1.9.13.jar ``` **AFTER** ``` $ ls -al jars/*asl* zsh: no matches found: jars/*asl* $ ls -al hive-jackson total 1984 drwxr-xr-x 4 dongjoon staff 128 Feb 23 15:37 . drwxr-xr-x 16 dongjoon staff 512 Feb 23 16:34 .. -rw-r--r-- 1 dongjoon staff 232248 Feb 23 15:37 jackson-core-asl-1.9.13.jar -rw-r--r-- 1 dongjoon staff 780664 Feb 23 15:37 jackson-mapper-asl-1.9.13.jar ``` ### Why are the changes needed? Since Apache Hadoop 3.3.5, only Apache Hive requires old CodeHaus Jackson dependencies. Apache Spark 3.5.0 tried to eliminate them completely but it's reverted due to Hive UDF support. - apache#40893 - apache#42446 SPARK-47119 added a way to exclude Apache Hive Jackson dependencies at the distribution building stage for Apache Spark 4.0.0. - apache#45201 This PR provides a way to exclude Apache Hive Jackson dependencies at runtime for Apache Spark 4.0.0. - Spark Shell without Apache Hive Jackson dependencies. ``` $ bin/spark-shell --driver-default-class-path "" ``` - Spark SQL Shell without Apache Hive Jackson dependencies. ``` $ bin/spark-sql --driver-default-class-path "" ``` - Spark Thrift Server without Apache Hive Jackson dependencies. ``` $ sbin/start-thriftserver.sh --driver-default-class-path "" ``` In addition, last but not least, this PR eliminates `CodeHaus Jackson` dependencies from the following Apache Spark deamons (using `spark-daemon.sh start`) because they don't require Hive `CodeHaus Jackson` dependencies - Spark Master - Spark Worker - Spark History Server ``` $ grep 'spark-daemon.sh start' * start-history-server.sh:exec "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 "$" start-master.sh:"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \ start-worker.sh: "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS $WORKER_NUM \ ``` ### Does this PR introduce _any_ user-facing change? No. There is no user-facing change by default. - For the distributions with `hive-jackson-provided` profile, the `scope` of Apache Hive Jackson dependencies is `provided` and `hive-jackson` directory is not created at all. - For the distributions with default setting, the `scope` of Apache Hive Jackson dependencies is still `compile`. In addition, they are in the Apache Spark's built-in class path like the following. ![Screenshot 2024-02-23 at 16 48 08](https://github.com/apache/spark/assets/9700541/99ed0f02-2792-4666-ae19-ce4f4b7b8ff9) - The following Spark Deamon don't use `CodeHaus Jackson` dependencies. - Spark Master - Spark Worker - Spark History Server ### How was this patch tested? Pass the CIs and manually build a distribution and check the class paths in the `Environment` Tab. ``` $ dev/make-distribution.sh -Phive,hive-thriftserver ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45237 from dongjoon-hyun/SPARK-47152. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This PR changes jackson-mapper-asl's scope from
test
to${hive.deps.scope}
.Why are the changes needed?
Fix
NoClassDefFoundError
when using Hive UDF:Does this PR introduce any user-facing change?
No.
How was this patch tested?
manual test.