-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6747][SQL] Support List<> as a return type in Hive UDF #5395
Conversation
Can one of the admins verify this patch? |
public List<String> evaluate(Object o) { | ||
return Arrays.asList("data1", "data2", "data3"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a blank line at the end of file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@maropu my concern is does Hive support the UDF which return type is |
ok to test |
Test build #29807 has started for PR 5395 at commit |
Test build #29807 has finished for PR 5395 at commit
|
Test PASSed. |
Test build #29825 has started for PR 5395 at commit |
Ok, I will look into the implementation and the documentation of Hive for that. |
Test build #29825 has finished for PR 5395 at commit
|
Test PASSed. |
ISTM hive supports list<> as a return type (see the links below). https://github.com/kyluka/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java#L163 |
// Hive seems to return this for struct types? | ||
case c: Class[_] if c == classOf[java.lang.Object] => NullType | ||
|
||
case c => throw new HiveDataTypeException("Unknown java type: " + c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be an AnalysisException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also prefer string interpolation to +
, s"Unknown UDF input type $c"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s"Unsupported java type $c" seems to be better in this error message because this method is not only designed for UDF.
Thanks for researching this. Can you address the final comments about avoiding the creation of a new type? |
02b3a91
to
3a8d952
Compare
Test build #30253 has started for PR 5395 at commit |
Sorry for the delay. Fixed and plz re-check them. |
Test build #30253 has finished for PR 5395 at commit
|
Test FAILed. |
Test build #30265 has started for PR 5395 at commit |
Test build #30265 has finished for PR 5395 at commit
|
Test PASSed. |
This is still creating a new type. Can we use |
8e333c7
to
ee56a0a
Compare
Missed and fixed. This fix satisfies your point? |
Test build #30445 has started for PR 5395 at commit |
Yes, LGTM |
Test build #30445 has finished for PR 5395 at commit
|
Test FAILed. |
cc @marmbrus Could you merge into master? I'll make a PR of SPARK-6912, but it depends on this. |
Can one of the admins verify this patch? |
cc @marmbrus just a reminder |
The last patch failed tests, no? |
ok to test |
Merged build triggered. |
Merged build started. |
Test build #32142 has started for PR 5395 at commit |
Test build #32142 has finished for PR 5395 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
Oh, sorry. I'll fix it. |
@marmbrus Made a mistake to close this pr, so may I make a new pr because I can't re-open it. |
PRs Merged 1. [Internal] Add AppleAwsClientFactory for Mascot (apache#577) 2. Hive: Log new metadata location in commit (apache#4681) 3. change timeout to 120 for now (apache#661) 4. Internal: Add hive_catalog parameter to SparkCatalog (apache#670) 5. Internal: Pull catalog setting to CachedClientPool (apache#673) 6. Core: Defer reading Avro metadata until ManifestFile is read (apache#5206) 7. API: Fix ID assignment in schema merging (apache#5395) 8. AWS: S3OutputStream - failure to close should persist on subsequent close calls (apache#5311) 9. API: Allow schema updates to find fields with case-insensitivity (apache#5440) 10. Spark 3.3: Spark mergeSchema to respect Spark Case Sensitivity Configuration (apache#5441)
This patch supports List<> as a return type in Hive UDF.
We assume an UDF below;
public class UDFToListString extends UDF {
public List evaluate(Object o)
{ return Arrays.asList("xxx", "yyy", "zzz"); }
}
An exception of scala.MatchError is thrown as follows when the UDF used in the current implementation.
scala.MatchError: interface java.util.List (of class java.lang.Class)
at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
...