Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8478][SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf #6920

Closed
wants to merge 9 commits into from
Closed

Conversation

BenFradet
Copy link
Contributor

Follow-up of #6902 for being coherent between Udf and UDF

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35364 has finished for PR 6920 at commit 660b669.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BenFradet
Copy link
Contributor Author

pinging @marmbrus
PR we talked about on the JIRA for SPARK-8356.

@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35589 has finished for PR 6920 at commit ef53470.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BenFradet
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35606 has finished for PR 6920 at commit ef53470.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BenFradet
Copy link
Contributor Author

Jenkins, retest this please (#6974)

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35663 has finished for PR 6920 at commit ef53470.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2015

Test build #36009 has finished for PR 6920 at commit c500f29.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BenFradet
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jun 29, 2015

Test build #36017 has finished for PR 6920 at commit c500f29.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks, merged to master.

@asfgit asfgit closed this in 931da5c Jun 29, 2015
wangyum pushed a commit that referenced this pull request May 31, 2023
…sure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action

### What changes were proposed in this pull request?
This pr remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action.

### Why are the changes needed?
After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test scope dependency, so when using GA to run benchmark, it is not in the classpath because GA uses

https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183

iunstead of the sbt `Test/runMain`.

`ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` to init `originalUDFs` before this pr, so when we run `ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the following exceptions:

```
Error: Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
	at org.apache.hadoop.hive.ql.udf.UDFJson.<clinit>(UDFJson.java:64)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
	at org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
	at org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519)
	at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163)
	at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154)
	at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:322)
	at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:530)
	at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:185)
	at org.apache.spark.sql.hive.test.TestHiveContext.<init>(TestHive.scala:133)
	at org.apache.spark.sql.hive.test.TestHive$.<init>(TestHive.scala:54)
	at org.apache.spark.sql.hive.test.TestHive$.<clinit>(TestHive.scala:53)
	at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.getSparkSession(ObjectHashAggregateExecBenchmark.scala:47)
	at org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark.$init$(SqlBasedBenchmark.scala:35)
	at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.<clinit>(ObjectHashAggregateExecBenchmark.scala:45)
	at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark.main(ObjectHashAggregateExecBenchmark.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128)
	at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1328)
	at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91)
	at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1025)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1116)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1125)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.type.TypeFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 40 more
```

Then I found that `originalUDFs` is a unused val in `TestHive` now(SPARK-1251 | #6920 introduced it and become unused after SPARK-20667 | #17908), so this pr remove it from `TestHive` to avoid calling `FunctionRegistry#getFunctionNames`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GitHub Actions
- Run `ObjectHashAggregateExecBenchmark` on Github Action:

**Before**

https://github.com/LuciferYang/spark/actions/runs/5128228630/jobs/9224706982

<img width="1181" alt="image" src="https://github.com/apache/spark/assets/1475305/02a58e3c-2dad-4ad4-85e4-f8576a5aabed">

**After**

https://github.com/LuciferYang/spark/actions/runs/5128227211/jobs/9224704507

<img width="1282" alt="image" src="https://github.com/apache/spark/assets/1475305/27c70ec6-e55d-4a19-a6c3-e892789b97f7">

`ObjectHashAggregateExecBenchmark` run successfully.

Closes #41369 from LuciferYang/hive-udf.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
czxm pushed a commit to czxm/spark that referenced this pull request Jun 12, 2023
…sure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action

### What changes were proposed in this pull request?
This pr remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action.

### Why are the changes needed?
After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test scope dependency, so when using GA to run benchmark, it is not in the classpath because GA uses

https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183

iunstead of the sbt `Test/runMain`.

`ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` to init `originalUDFs` before this pr, so when we run `ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the following exceptions:

```
Error: Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
	at org.apache.hadoop.hive.ql.udf.UDFJson.<clinit>(UDFJson.java:64)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
	at org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
	at org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519)
	at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163)
	at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154)
	at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:322)
	at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:530)
	at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:185)
	at org.apache.spark.sql.hive.test.TestHiveContext.<init>(TestHive.scala:133)
	at org.apache.spark.sql.hive.test.TestHive$.<init>(TestHive.scala:54)
	at org.apache.spark.sql.hive.test.TestHive$.<clinit>(TestHive.scala:53)
	at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.getSparkSession(ObjectHashAggregateExecBenchmark.scala:47)
	at org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark.$init$(SqlBasedBenchmark.scala:35)
	at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.<clinit>(ObjectHashAggregateExecBenchmark.scala:45)
	at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark.main(ObjectHashAggregateExecBenchmark.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128)
	at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1328)
	at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91)
	at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1025)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1116)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1125)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.type.TypeFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 40 more
```

Then I found that `originalUDFs` is a unused val in `TestHive` now(SPARK-1251 | apache#6920 introduced it and become unused after SPARK-20667 | apache#17908), so this pr remove it from `TestHive` to avoid calling `FunctionRegistry#getFunctionNames`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GitHub Actions
- Run `ObjectHashAggregateExecBenchmark` on Github Action:

**Before**

https://github.com/LuciferYang/spark/actions/runs/5128228630/jobs/9224706982

<img width="1181" alt="image" src="https://github.com/apache/spark/assets/1475305/02a58e3c-2dad-4ad4-85e4-f8576a5aabed">

**After**

https://github.com/LuciferYang/spark/actions/runs/5128227211/jobs/9224704507

<img width="1282" alt="image" src="https://github.com/apache/spark/assets/1475305/27c70ec6-e55d-4a19-a6c3-e892789b97f7">

`ObjectHashAggregateExecBenchmark` run successfully.

Closes apache#41369 from LuciferYang/hive-udf.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
prabhjyotsingh pushed a commit to acceldata-io/spark3 that referenced this pull request Apr 9, 2024
…sure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action

### What changes were proposed in this pull request?
This pr remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action.

### Why are the changes needed?
After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test scope dependency, so when using GA to run benchmark, it is not in the classpath because GA uses

https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183

iunstead of the sbt `Test/runMain`.

`ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` to init `originalUDFs` before this pr, so when we run `ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the following exceptions:

```
Error: Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
	at org.apache.hadoop.hive.ql.udf.UDFJson.<clinit>(UDFJson.java:64)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
	at org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
	at org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519)
	at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163)
	at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154)
	at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:322)
	at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:530)
	at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:185)
	at org.apache.spark.sql.hive.test.TestHiveContext.<init>(TestHive.scala:133)
	at org.apache.spark.sql.hive.test.TestHive$.<init>(TestHive.scala:54)
	at org.apache.spark.sql.hive.test.TestHive$.<clinit>(TestHive.scala:53)
	at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.getSparkSession(ObjectHashAggregateExecBenchmark.scala:47)
	at org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark.$init$(SqlBasedBenchmark.scala:35)
	at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.<clinit>(ObjectHashAggregateExecBenchmark.scala:45)
	at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark.main(ObjectHashAggregateExecBenchmark.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128)
	at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1328)
	at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91)
	at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1025)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1116)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1125)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.type.TypeFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 40 more
```

Then I found that `originalUDFs` is a unused val in `TestHive` now(SPARK-1251 | apache#6920 introduced it and become unused after SPARK-20667 | apache#17908), so this pr remove it from `TestHive` to avoid calling `FunctionRegistry#getFunctionNames`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GitHub Actions
- Run `ObjectHashAggregateExecBenchmark` on Github Action:

**Before**

https://github.com/LuciferYang/spark/actions/runs/5128228630/jobs/9224706982

<img width="1181" alt="image" src="https://github.com/apache/spark/assets/1475305/02a58e3c-2dad-4ad4-85e4-f8576a5aabed">

**After**

https://github.com/LuciferYang/spark/actions/runs/5128227211/jobs/9224704507

<img width="1282" alt="image" src="https://github.com/apache/spark/assets/1475305/27c70ec6-e55d-4a19-a6c3-e892789b97f7">

`ObjectHashAggregateExecBenchmark` run successfully.

Closes apache#41369 from LuciferYang/hive-udf.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit 3472619)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants