Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26861][SQL] deprecate typed sum/count/average #23763

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

These builtin typed aggregate functions are not very useful:

  1. users can just call the untyped ones and turn the resulting dataframe to a dataset. It has better performance.
  2. the typed aggregate functions have subtle different behaviors regarding empty input.

I think we should get rid of these builtin typed agg functions and suggest users to use the untyped ones.

However, these functions are still useful as a demo of the Aggregator API, so I copied them to the example module.

How was this patch tested?

N/A

@cloud-fan
Copy link
Contributor Author

cc @rxin @10110346 @HyukjinKwon

@HyukjinKwon
Copy link
Member

Yea, deprecating way works to me. +1.

@SparkQA
Copy link

SparkQA commented Feb 12, 2019

Test build #102252 has finished for PR 23763 at commit a7faa2b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class TypedSum[IN](val f: IN => Long) extends Aggregator[IN, Long, Long]
  • class TypedCount[IN](val f: IN => Any) extends Aggregator[IN, Long, Long]
  • class TypedAverage[IN](val f: IN => Double) extends Aggregator[IN, (Double, Long), Double]

}
// scalastyle:on println

class TypedSum[IN](val f: IN => Long) extends Aggregator[IN, Long, Long] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, +1 to have an example somewhere like this since Aggregator seems to be still an API, it looks better to let users know how to use it. If we go ahead with getting these out, the current PR looks quite good and conservative.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 12, 2019

Test build #102259 has finished for PR 23763 at commit a7faa2b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class TypedSum[IN](val f: IN => Long) extends Aggregator[IN, Long, Long]
  • class TypedCount[IN](val f: IN => Any) extends Aggregator[IN, Long, Long]
  • class TypedAverage[IN](val f: IN => Double) extends Aggregator[IN, (Double, Long), Double]

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from me but I think I should leave it to @rxin

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in 8656af9 Feb 15, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

These builtin typed aggregate functions are not very useful:
1. users can just call the untyped ones and turn the resulting dataframe to a dataset. It has better performance.
2. the typed aggregate functions have subtle different behaviors regarding empty input.

I think we should get rid of these builtin typed agg functions and suggest users to use the untyped ones.

However, these functions are still useful as a demo of the `Aggregator` API, so I copied them to the example module.

## How was this patch tested?

N/A

Closes apache#23763 from cloud-fan/example.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants