Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28158][SQL] Hive UDFs supports UDT type #24961

Closed
wants to merge 6 commits into from

Conversation

uncleGen
Copy link
Contributor

@uncleGen uncleGen commented Jun 25, 2019

What changes were proposed in this pull request?

After this PR, we can create and register Hive UDFs to accept UDT type, like VectorUDT and MatrixUDT. These UDTs are widely used in Spark machine learning.

How was this patch tested?

add new ut

@SparkQA
Copy link

SparkQA commented Jun 25, 2019

Test build #106870 has finished for PR 24961 at commit 9039549.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 25, 2019

Test build #106882 has finished for PR 24961 at commit 9039549.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

uncleGen commented Jul 2, 2019

@cloud-fan Could you please help to review?

@maropu
Copy link
Member

maropu commented Jul 2, 2019

How do you handle these type data in Hive UDFs?

@uncleGen
Copy link
Contributor Author

uncleGen commented Jul 2, 2019

@maropu For example, the internal sql type of VectorUDT and MatrixUDT is StructType. We can use StructObjectInspector in Hive UDFs to handle there type data. There is a demo: https://github.com/aliyun/aliyun-emapreduce-sdk/blob/master-2.x/emr-sql/src/main/scala/org/apache/spark/sql/aliyun/udfs/ml/LogisticRegressionUDF.scala#L86

@maropu
Copy link
Member

maropu commented Jul 2, 2019

But, the UDF works in Spark and not in Hive, right? I'm not sure that this is a right approach...

@uncleGen
Copy link
Contributor Author

uncleGen commented Jul 2, 2019

@maropu I dont test the UDF in Hive, but I think it will work as it is a Hive UDF definition.

@cloud-fan
Copy link
Contributor

Can we add an end-to-end test to show the value of this patch?

@SparkQA
Copy link

SparkQA commented Jul 15, 2019

Test build #107684 has finished for PR 24961 at commit 3faaa00.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class VectorUDTSuite extends QueryTest

@SparkQA
Copy link

SparkQA commented Jul 15, 2019

Test build #107685 has finished for PR 24961 at commit 9cadbe4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class VectorUDTSuite extends QueryTest

mllib/pom.xml Outdated Show resolved Hide resolved
@HyukjinKwon
Copy link
Member

ping @uncleGen

@uncleGen
Copy link
Contributor Author

uncleGen commented Sep 18, 2019

let me continue to finish this pr

@SparkQA
Copy link

SparkQA commented Sep 25, 2019

Test build #111339 has finished for PR 24961 at commit a8e2fb3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class VectorUDTSuite extends SparkFunSuite
  • class HiveUserDefinedTypeSuite extends QueryTest
  • class TestUDF extends GenericUDF

@SparkQA
Copy link

SparkQA commented Sep 25, 2019

Test build #111340 has finished for PR 24961 at commit 7138d6a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class VectorUDTSuite extends SparkFunSuite
  • class HiveUserDefinedTypeSuite extends QueryTest
  • class TestUDF extends GenericUDF

@SparkQA
Copy link

SparkQA commented Sep 25, 2019

Test build #111338 has finished for PR 24961 at commit 12ee9d6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class VectorUDTSuite extends QueryTest

@SparkQA
Copy link

SparkQA commented Sep 25, 2019

Test build #111348 has finished for PR 24961 at commit 4a4e75c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

ping @HyukjinKwon @cloud-fan

@SparkQA
Copy link

SparkQA commented Oct 24, 2019

Test build #112603 has finished for PR 24961 at commit b71273d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 28, 2019

Test build #112757 has finished for PR 24961 at commit ab0282e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class HiveUserDefinedTypeSuite extends QueryTest with TestHiveSingleton

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Oct 28, 2019

Test build #112762 has finished for PR 24961 at commit ab0282e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class HiveUserDefinedTypeSuite extends QueryTest with TestHiveSingleton

@HyukjinKwon
Copy link
Member

Merged to master.

@HeartSaVioR
Copy link
Contributor

I've submitted a follow-up PR as new test is high likely failing in recent CI builds. #26287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants