Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21848][SQL] Add trait UserDefinedExpression to identify user-defined functions #19064

Closed
wants to merge 4 commits into from

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Aug 28, 2017

What changes were proposed in this pull request?

Add trait UserDefinedExpression to identify user-defined functions.
UDF can be expensive. In optimizer we may need to avoid executing UDF multiple times.
E.g.

table.select(UDF as 'a).select('a, ('a + 1) as 'b)

If UDF is expensive in this case, optimizer should not collapse the project to

table.select(UDF as 'a, (UDF+1) as 'b)

Currently UDF classes like PythonUDF, HiveGenericUDF are not defined in catalyst.
This PR is to add a new trait to make it easier to identify user-defined functions.

How was this patch tested?

Unit test

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a trait? It doesn't seem to be used other than to mark.

@@ -22,6 +22,9 @@ import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
import org.apache.spark.sql.catalyst.expressions.codegen._
import org.apache.spark.sql.types.DataType

// Trait for identifying user-defined functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
 * Common base trait for user-defined functions, including ScalaUDF, ScalaUDAF, PythonUDF, 
 * HiveSimpleUDF, HiveGenericUDF, and HiveUDAFFunction.
 */

@gatorsmile
Copy link
Member

Yes. This is a potential optimization we should do, but it should be cost based. Let us improve the resolution logics using this common base trait in this PR first.

* Common base trait for user-defined functions, including ScalaUDF, ScalaUDAF, PythonUDF,
* HiveSimpleUDF, HiveGenericUDF, and HiveUDAFFunction.
*/
trait UDFType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe UserDefinedExpression?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also sort of weird to put this in ScalaUDF.scala. Either put it in Expression.scala, or create its own file ...

@SparkQA
Copy link

SparkQA commented Aug 28, 2017

Test build #81174 has finished for PR 19064 at commit 5f508ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2017

Test build #81177 has finished for PR 19064 at commit de642d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

I'm +1 on this, it should be useful to pattern match the trait instead of handling ScalaUDF/ScalaUDAF/PythonUDF... in future optimize rules.

@SparkQA
Copy link

SparkQA commented Aug 29, 2017

Test build #81192 has finished for PR 19064 at commit 7b67cd3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Aug 29, 2017

It is better to revise PR title and description to reflect the change from UDFType to UserDefinedExpression.

@SparkQA
Copy link

SparkQA commented Aug 29, 2017

Test build #81202 has finished for PR 19064 at commit be85df3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Aug 29, 2017

retest this please.

@gengliangwang gengliangwang changed the title [SPARK-21848][SQL] Add trait UDFType to identify user-defined functions [SPARK-21848][SQL] Add trait UserDefinedExpression to identify user-defined functions Aug 29, 2017
@SparkQA
Copy link

SparkQA commented Aug 29, 2017

Test build #81204 has finished for PR 19064 at commit be85df3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@asfgit asfgit closed this in 8fcbda9 Aug 29, 2017
@gatorsmile
Copy link
Member

Thanks! Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants