[SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper#28079
[SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper#28079HeartSaVioR wants to merge 3 commits intoapache:masterfrom
Conversation
…ntext classloader after transformed
| } | ||
| } | ||
|
|
||
| test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " + |
There was a problem hiding this comment.
This test is moved to HiveUDFDynamicLoadSuite - now it's being tested with 5 available Hive UDF types.
| clazz = Utils.getContextOrSparkClassLoader.loadClass(functionClassName) | ||
| .asInstanceOf[Class[_ <: AnyRef]] | ||
| } | ||
| val func = clazz.getConstructor().newInstance().asInstanceOf[UDFType] |
There was a problem hiding this comment.
If we add clazz = null below this line, the new UT (SPARK-31312) fails with UDF type (only one of 5 fails, because other cases this cases instance instead).
|
cc. @cloud-fan @maropu |
| val jarUrl = getHiveUDFTestJarUrl | ||
| test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " + | ||
| s"current thread context classloader (${udfInfo.identifier}") { | ||
| testHiveUDFUsingJarWithChangingClassloader( |
There was a problem hiding this comment.
nit: if the method is only called once, can we inline it?
|
|
||
| test("SPARK-31312 Transformed Hive UDF using jar expression should not be failed to run " + | ||
| s"regardless of current thread context classloader (${udfInfo.identifier})") { | ||
| testHiveUDFUsingJarWithChangingClassloaderWithCopyUDFExpression( |
| assert(Thread.currentThread().getContextClassLoader eq | ||
| spark.sqlContext.sharedState.jarClassLoader) | ||
|
|
||
| val udfExpr = fnCreateHiveUDFExpression() |
There was a problem hiding this comment.
I think this test should start here. The above test is the same as testHiveUDFUsingJarWithChangingClassloader.
|
Thanks for the review comments. I've just consolidated two tests into one, and inlined. Please take a look again. |
|
Test build #120634 has finished for PR 28079 at commit
|
| ) | ||
|
|
||
| udfTestInfos.foreach { udfInfo => | ||
| val jarUrl = getHiveUDFTestJarUrl |
There was a problem hiding this comment.
We can inline it as well
|
Test build #120641 has finished for PR 28079 at commit
|
|
Test build #120642 has finished for PR 28079 at commit
|
…unctionWrapper ### What changes were proposed in this pull request? This patch proposes to cache Class instance for the UDF instance in HiveFunctionWrapper to fix the case where Hive simple UDF is somehow transformed (expression is copied) and evaluated later with another classloader (for the case current thread context classloader is somehow changed). In this case, Spark throws CNFE as of now. It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the UDF instance whereas it doesn't do for `UDF` type. The comment says Spark has to create instance every time for UDF, so we cannot simply do the same. This patch caches Class instance instead, and switch current thread context classloader to which loads the Class instance. This patch extends the test boundary as well. We only tested with GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to avoid regression for other types as well, this patch adds all available types (UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the boundary of tests. Credit to cloud-fan as he discovered the problem and proposed the solution. ### Why are the changes needed? Above section describes why it's a bug and how it's fixed. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New UTs added. Closes #28079 from HeartSaVioR/SPARK-31312. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2a6aa8e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
thanks, merging to master/3.0! |
|
@HeartSaVioR can you send a backport PR for 2.4? thanks! |
…unctionWrapper This patch proposes to cache Class instance for the UDF instance in HiveFunctionWrapper to fix the case where Hive simple UDF is somehow transformed (expression is copied) and evaluated later with another classloader (for the case current thread context classloader is somehow changed). In this case, Spark throws CNFE as of now. It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the UDF instance whereas it doesn't do for `UDF` type. The comment says Spark has to create instance every time for UDF, so we cannot simply do the same. This patch caches Class instance instead, and switch current thread context classloader to which loads the Class instance. This patch extends the test boundary as well. We only tested with GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to avoid regression for other types as well, this patch adds all available types (UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the boundary of tests. Credit to cloud-fan as he discovered the problem and proposed the solution. Above section describes why it's a bug and how it's fixed. No. New UTs added. Closes apache#28079 from HeartSaVioR/SPARK-31312. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
Thanks for the quick review and merge! #28086 is for branch-2.4. |
|
late LGTM, thanks for the work, @HeartSaVioR ! |
|
(very late) LGTM! |
…unctionWrapper ### What changes were proposed in this pull request? This patch proposes to cache Class instance for the UDF instance in HiveFunctionWrapper to fix the case where Hive simple UDF is somehow transformed (expression is copied) and evaluated later with another classloader (for the case current thread context classloader is somehow changed). In this case, Spark throws CNFE as of now. It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the UDF instance whereas it doesn't do for `UDF` type. The comment says Spark has to create instance every time for UDF, so we cannot simply do the same. This patch caches Class instance instead, and switch current thread context classloader to which loads the Class instance. This patch extends the test boundary as well. We only tested with GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to avoid regression for other types as well, this patch adds all available types (UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the boundary of tests. Credit to cloud-fan as he discovered the problem and proposed the solution. ### Why are the changes needed? Above section describes why it's a bug and how it's fixed. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New UTs added. Closes apache#28079 from HeartSaVioR/SPARK-31312. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This patch proposes to cache Class instance for the UDF instance in HiveFunctionWrapper to fix the case where Hive simple UDF is somehow transformed (expression is copied) and evaluated later with another classloader (for the case current thread context classloader is somehow changed). In this case, Spark throws CNFE as of now.
It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the UDF instance whereas it doesn't do for
UDFtype. The comment says Spark has to create instance every time for UDF, so we cannot simply do the same. This patch caches Class instance instead, and switch current thread context classloader to which loads the Class instance.This patch extends the test boundary as well. We only tested with GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to avoid regression for other types as well, this patch adds all available types (UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the boundary of tests.
Credit to @cloud-fan as he discovered the problem and proposed the solution.
Why are the changes needed?
Above section describes why it's a bug and how it's fixed.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UTs added.