New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26353][SQL]Add typed aggregate functions(max/min) to the example module. #23304
Conversation
Test build #100056 has finished for PR 23304 at commit
|
retest this please |
Test build #100071 has finished for PR 23304 at commit
|
retest this please |
Test build #100084 has finished for PR 23304 at commit
|
cc @rxin @cloud-fan |
See the PR #18113 |
This PR #18113 has a better solution, thanks @gatorsmile |
defee03
to
0b64ae7
Compare
This PR #18113 doesn't look like making progress for a long time so far, I take this PR over. |
Test build #100136 has finished for PR 23304 at commit
|
Test build #100139 has finished for PR 23304 at commit
|
do you resolve #18113 (comment) ? |
@cloud-fan No, I don't |
?? then are you going to resolve that comment? |
I can't solve this problem at present, someone can help me? |
How did you fix the tests? #18113 couldn't pass tests because of that problem. |
I just duplicated his code, I didn't do anything else. |
/** | ||
* Min aggregate function for floating point (double) type. | ||
* | ||
* @since 2.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's 3.0.0 now
class JavaTypedMinDouble[IN](override val f: IN => Double) | ||
extends TypedMinDouble[IN, java.lang.Double] { | ||
override def outputEncoder: Encoder[java.lang.Double] = ExpressionEncoder[java.lang.Double]() | ||
override def finish(reduction: MutableDouble): java.lang.Double = reduction.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if reduction
is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I will update
class JavaTypedMaxDouble[IN](override val f: IN => Double) | ||
extends TypedMaxDouble[IN, java.lang.Double] { | ||
override def outputEncoder: Encoder[java.lang.Double] = ExpressionEncoder[java.lang.Double]() | ||
override def finish(reduction: MutableDouble): java.lang.Double = reduction.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
||
class JavaTypedMinLong[IN](override val f: IN => Long) extends TypedMinLong[IN, java.lang.Long] { | ||
override def outputEncoder: Encoder[java.lang.Long] = ExpressionEncoder[java.lang.Long]() | ||
override def finish(reduction: MutableLong): java.lang.Long = reduction.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
||
class JavaTypedMaxLong[IN](override val f: IN => Long) extends TypedMaxLong[IN, java.lang.Long] { | ||
override def outputEncoder: Encoder[java.lang.Long] = ExpressionEncoder[java.lang.Long]() | ||
override def finish(reduction: MutableLong): java.lang.Long = reduction.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
KeyValueGroupedDataset<String, Tuple2<String, Integer>> grouped = generateGroupedDataset(); | ||
Dataset<Tuple2<String, Double>> agged = grouped.agg(typed.min(v -> (double)v._2())); | ||
Assert.assertEquals( | ||
Arrays.asList(new Tuple2<>("a", 1.0), new Tuple2<>("b", 3.0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 space indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I will update
("b", Some(-4.0), Some(-4L), Some(4.0), Some(4L))) | ||
} | ||
|
||
test("typed aggregate: empty") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test empty input with global aggregate in the java test as well?
Test build #101163 has finished for PR 23304 at commit
|
retest this please |
Test build #101170 has finished for PR 23304 at commit
|
} | ||
|
||
/** | ||
* Min aggregate function for floating point (double) type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Min
-> Max
. Actually I will elaborate it more. Please avoid abbreviation like min or max in doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok,thanks
Looks fine to me too. |
Test build #101212 has finished for PR 23304 at commit
|
retest this please |
Test build #101219 has finished for PR 23304 at commit
|
I'm kind of wondering whether it'd make sense to add these. It adds a lot of code which would incur some maintenance cost, and users can trivially implement these themselves, or just use the untyped version, and we don't need to spend time discussing whether these should follow SQL semantics or Scala semantics. |
Actually, WDYT about ..
Was having the kind of similar impression. |
IMHO it would be ideal to have common aggregate functions for typed version as well (agree to restrict them to like sum/avg/count/min/max), otherwise end users using typed API have to implement such basic thing on their own. Btw, TODOs in source code are tend to being considered as things to contribute. Might be ideal to remove these false alarm once we don't see its needs. |
Hi @10110346 , can you move these new functions to the "example" package? |
Ok, I will do it, thanks |
04c0a22
to
bbb820b
Compare
Test build #102382 has finished for PR 23304 at commit
|
@@ -44,6 +45,12 @@ object SimpleTypedAggregator { | |||
println("running typed average:") | |||
ds.groupByKey(_._1).agg(new TypedAverage[(Long, Long)](_._2.toDouble).toColumn).show() | |||
|
|||
println("running typed Minimum:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minimum
-> minimum
to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
bbb820b
to
20aef3e
Compare
Test build #102385 has finished for PR 23304 at commit
|
} | ||
|
||
override def bufferEncoder: Encoder[MutableDouble] = Encoders.kryo[MutableDouble] | ||
override def outputEncoder: Encoder[Option[Double]] = ExpressionEncoder[Option[Double]]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use Encoders.product[Option[Double]]
? Or we can use Encoders.DOUBLE
and use null to represent None
.
The thing is, ExpressionEncoder
is an internal class and we should avoid exposing it in the example module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
20aef3e
to
26a448d
Compare
Test build #102440 has finished for PR 23304 at commit
|
thanks, merging to master! |
@cloud-fan It seems merging has not succeeded, thanks |
Merged to master |
What changes were proposed in this pull request?
Add typed aggregate functions(max/min) to the example module.
How was this patch tested?
Manual testing:
running typed minimum:
running typed maximum: