Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6747][SQL] Support List<> as a return type in Hive UDF #5395

Closed
wants to merge 5 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Apr 7, 2015

This patch supports List<> as a return type in Hive UDF.

We assume an UDF below;
public class UDFToListString extends UDF {
public List evaluate(Object o)
{ return Arrays.asList("xxx", "yyy", "zzz"); }
}
An exception of scala.MatchError is thrown as follows when the UDF used in the current implementation.
scala.MatchError: interface java.util.List (of class java.lang.Class)
at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
...

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

public List<String> evaluate(Object o) {
return Arrays.asList("data1", "data2", "data3");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line at the end of file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@chenghao-intel
Copy link
Contributor

@maropu my concern is does Hive support the UDF which return type is List<Object>? Can you confirm that? Or can you provide a Hive comparison unit test?

@marmbrus
Copy link
Contributor

marmbrus commented Apr 7, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Apr 7, 2015

Test build #29807 has started for PR 5395 at commit bd165b9.

@SparkQA
Copy link

SparkQA commented Apr 7, 2015

Test build #29807 has finished for PR 5395 at commit bd165b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29807/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Apr 8, 2015

Test build #29825 has started for PR 5395 at commit 02b3a91.

@maropu
Copy link
Member Author

maropu commented Apr 8, 2015

Ok, I will look into the implementation and the documentation of Hive for that.

@SparkQA
Copy link

SparkQA commented Apr 8, 2015

Test build #29825 has finished for PR 5395 at commit 02b3a91.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29825/
Test PASSed.

// Hive seems to return this for struct types?
case c: Class[_] if c == classOf[java.lang.Object] => NullType

case c => throw new HiveDataTypeException("Unknown java type: " + c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be an AnalysisException.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also prefer string interpolation to +, s"Unknown UDF input type $c"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s"Unsupported java type $c" seems to be better in this error message because this method is not only designed for UDF.

@marmbrus
Copy link
Contributor

Thanks for researching this. Can you address the final comments about avoiding the creation of a new type?

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30253 has started for PR 5395 at commit 3a8d952.

@maropu
Copy link
Member Author

maropu commented Apr 14, 2015

Sorry for the delay. Fixed and plz re-check them.

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30253 has finished for PR 5395 at commit 3a8d952.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.4.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.6.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30253/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30265 has started for PR 5395 at commit 8e333c7.

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30265 has finished for PR 5395 at commit 8e333c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30265/
Test PASSed.

@marmbrus
Copy link
Contributor

This is still creating a new type. Can we use NullType instead?

@maropu
Copy link
Member Author

maropu commented Apr 17, 2015

Missed and fixed. This fix satisfies your point?

@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30445 has started for PR 5395 at commit ee56a0a.

@marmbrus
Copy link
Contributor

Yes, LGTM

@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30445 has finished for PR 5395 at commit ee56a0a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30445/
Test FAILed.

@maropu
Copy link
Member Author

maropu commented Apr 27, 2015

cc @marmbrus Could you merge into master? I'll make a PR of SPARK-6912, but it depends on this.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@maropu
Copy link
Member Author

maropu commented May 7, 2015

cc @marmbrus just a reminder

@marmbrus
Copy link
Contributor

marmbrus commented May 7, 2015

The last patch failed tests, no?

@marmbrus
Copy link
Contributor

marmbrus commented May 7, 2015

ok to test

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32142 has started for PR 5395 at commit ee56a0a.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32142 has finished for PR 5395 at commit ee56a0a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32142/
Test FAILed.

@maropu
Copy link
Member Author

maropu commented May 12, 2015

Oh, sorry. I'll fix it.

@maropu maropu closed this May 15, 2015
@maropu maropu deleted the FixBugInHiveInspectors branch May 15, 2015 06:07
@maropu
Copy link
Member Author

maropu commented May 15, 2015

@marmbrus Made a mistake to close this pr, so may I make a new pr because I can't re-open it.

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
PRs Merged
1. [Internal] Add AppleAwsClientFactory for Mascot (apache#577)
2. Hive: Log new metadata location in commit (apache#4681)
3. change timeout to 120 for now (apache#661)
4. Internal: Add hive_catalog parameter to SparkCatalog (apache#670)
5. Internal: Pull catalog setting to CachedClientPool (apache#673)
6. Core: Defer reading Avro metadata until ManifestFile is read (apache#5206)
7. API: Fix ID assignment in schema merging (apache#5395)
8. AWS: S3OutputStream - failure to close should persist on subsequent close calls (apache#5311)
9. API: Allow schema updates to find fields with case-insensitivity (apache#5440)
10. Spark 3.3: Spark mergeSchema to respect Spark Case Sensitivity Configuration (apache#5441)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants