New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26216][SQL] Do not use case class as public API (UserDefinedFunction) #23178
Conversation
if (inputTypes.isDefined) { | ||
assert(inputTypes.get.length == nullableTypes.get.length) | ||
} | ||
|
||
val inputsNullSafe = if (nullableTypes.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use getOrElse
here and even inline this into the call below, but I don't really care.
// This is a `var` instead of in the constructor for backward compatibility of this case class. | ||
// TODO: revisit this case class in Spark 3.0, and narrow down the public surface. | ||
private[sql] var nullableTypes: Option[Seq[Boolean]] = None | ||
trait UserDefinedFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this sealed
? I'm not sure. Would any user ever extend this meaningfully? I kind of worry someone will start doing so; maybe they already subclass it in some cases though. Elsewhere it might help the compiler understand in match
statements that there is only ever one type of UDF class to match on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea! though I'm not sure if sealed
works for Java.
Good idea to have it sealed!
… On Nov 29, 2018, at 7:04 AM, Sean Owen ***@***.***> wrote:
@srowen commented on this pull request.
In sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala:
> if (inputTypes.isDefined) {
assert(inputTypes.get.length == nullableTypes.get.length)
}
+ val inputsNullSafe = if (nullableTypes.isEmpty) {
You can use getOrElse here and even inline this into the call below, but I don't really care.
In sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala:
> @@ -38,114 +38,108 @@ import org.apache.spark.sql.types.DataType
* @SInCE 1.3.0
*/
@stable
-case class UserDefinedFunction protected[sql] (
- f: AnyRef,
- dataType: DataType,
- inputTypes: Option[Seq[DataType]]) {
-
- private var _nameOption: Option[String] = None
- private var _nullable: Boolean = true
- private var _deterministic: Boolean = true
-
- // This is a `var` instead of in the constructor for backward compatibility of this case class.
- // TODO: revisit this case class in Spark 3.0, and narrow down the public surface.
- private[sql] var nullableTypes: Option[Seq[Boolean]] = None
+trait UserDefinedFunction {
Should we make this sealed? I'm not sure. Would any user ever extend this meaningfully? I kind of worry someone will start doing so; maybe they already subclass it in some cases though. Elsewhere it might help the compiler understand in match statements that there is only ever one type of UDF class to match on.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Test build #99454 has finished for PR 23178 at commit
|
Test build #4448 has started for PR 23178 at commit |
Test build #99457 has finished for PR 23178 at commit
|
+1 as well |
Test build #99502 has finished for PR 23178 at commit
|
@@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType | |||
* @since 1.3.0 | |||
*/ | |||
@Stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 for this PR, but I'm just wondering if this @Stable
tag with @since 1.3.0
tag is valid or not here.
Previous case class was stable until 2.4.x and new trait will be stable since 3.0. But, the stability is broken at 3.0.0 once. Did I understand correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to change it to @Stable
with @since 3.0.0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea actually I was wondering about the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go ahead and leave the Since version. The API is essentially unchanged, though there are some marginal breaking compile time changes. But same is true of many things we are changing in 3.0. I've tagged the JIRA with release-notes
and will add a blurb about the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a new API anyway, it will be weird to change since to 3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you, @HyukjinKwon , @srowen , @cloud-fan .
thanks for the review, merging to master! |
Why not keep it an abstract class? This is going to break every application that uses UDFs, which while allowed at a major version, seems like a pretty big annoyance. |
}, | ||
|
||
// [SPARK-26216][SQL] Do not use case class as public API (UserDefinedFunction) | ||
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.expressions.UserDefinedFunction$"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of this in #23351?
…UserDefinedFunction ## What changes were proposed in this pull request? A followup of apache#23178 , to keep binary compability by using abstract class. ## How was this patch tested? Manual test. I created a simple app with Spark 2.4 ``` object TryUDF { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().appName("test").master("local[*]").getOrCreate() import spark.implicits._ val f1 = udf((i: Int) => i + 1) println(f1.deterministic) spark.range(10).select(f1.asNonNullable().apply($"id")).show() spark.stop() } } ``` When I run it with current master, it fails with ``` java.lang.IncompatibleClassChangeError: Found interface org.apache.spark.sql.expressions.UserDefinedFunction, but class was expected ``` When I run it with this PR, it works Closes apache#23351 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…nction) ## What changes were proposed in this pull request? It's a bad idea to use case class as public API, as it has a very wide surface. For example, the `copy` method, its fields, the companion object, etc. For a particular case, `UserDefinedFunction`. It has a private constructor, and I believe we only want users to access a few methods:`apply`, `nullable`, `asNonNullable`, etc. However, all its fields, and `copy` method, and the companion object are public unexpectedly. As a result, we made many tricks to work around the binary compatibility issues. This PR proposes to only make interfaces public, and hide implementations behind with a private class. Now `UserDefinedFunction` is a pure trait, and the concrete implementation is `SparkUserDefinedFunction`, which is private. Changing class to interface is not binary compatible(but source compatible), so 3.0 is a good chance to do it. This is the first PR to go with this direction. If it's accepted, I'll create a umbrella JIRA and fix all the public case classes. ## How was this patch tested? existing tests. Closes apache#23178 from cloud-fan/udf. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…UserDefinedFunction ## What changes were proposed in this pull request? A followup of apache#23178 , to keep binary compability by using abstract class. ## How was this patch tested? Manual test. I created a simple app with Spark 2.4 ``` object TryUDF { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().appName("test").master("local[*]").getOrCreate() import spark.implicits._ val f1 = udf((i: Int) => i + 1) println(f1.deterministic) spark.range(10).select(f1.asNonNullable().apply($"id")).show() spark.stop() } } ``` When I run it with current master, it fails with ``` java.lang.IncompatibleClassChangeError: Found interface org.apache.spark.sql.expressions.UserDefinedFunction, but class was expected ``` When I run it with this PR, it works Closes apache#23351 from cloud-fan/minor. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
It's a bad idea to use case class as public API, as it has a very wide surface. For example, the
copy
method, its fields, the companion object, etc.For a particular case,
UserDefinedFunction
. It has a private constructor, and I believe we only want users to access a few methods:apply
,nullable
,asNonNullable
, etc.However, all its fields, and
copy
method, and the companion object are public unexpectedly. As a result, we made many tricks to work around the binary compatibility issues.This PR proposes to only make interfaces public, and hide implementations behind with a private class. Now
UserDefinedFunction
is a pure trait, and the concrete implementation isSparkUserDefinedFunction
, which is private.Changing class to interface is not binary compatible(but source compatible), so 3.0 is a good chance to do it.
This is the first PR to go with this direction. If it's accepted, I'll create a umbrella JIRA and fix all the public case classes.
How was this patch tested?
existing tests.