Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43943][SQL][PYTHON][CONNECT] Add SQL math functions to Scala and Python #41435

Closed
wants to merge 4 commits into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Jun 2, 2023

What changes were proposed in this pull request?

Add following functions:

  • ceiling
  • e
  • pi
  • ln
  • negative
  • positive
  • power
  • sign
  • std
  • width_bucket

to:

  • Scala API
  • Python API
  • Spark Connect Scala Client
  • Spark Connect Python Client

This PR also adds negate (which already exists in Scala API and SCSC) to Python API and SCPC.

Why are the changes needed?

for parity

Does this PR introduce any user-facing change?

yes, new functions

How was this patch tested?

added ut / doctest

@zhengruifeng zhengruifeng marked this pull request as draft June 2, 2023 13:09
@zhengruifeng zhengruifeng force-pushed the sql_func_math branch 2 times, most recently from 44441ac to 9572ce2 Compare June 5, 2023 08:06
@zhengruifeng zhengruifeng marked this pull request as ready for review June 5, 2023 08:06
@zhengruifeng zhengruifeng force-pushed the sql_func_math branch 3 times, most recently from 815c93f to b882811 Compare June 6, 2023 06:12
@zhengruifeng zhengruifeng force-pushed the sql_func_math branch 3 times, most recently from c83cd0b to 8ddad2a Compare June 7, 2023 02:11
@HyukjinKwon
Copy link
Member

oh also might need to put them in Python reference doc .rst file

log
log10
log1p
log2
negative
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon they were added here. except the negate, I will add it.

BTW, in the last commit I add negate (which only existed in scala) to the python side for the parity between py and scala

@HyukjinKwon
Copy link
Member

LGTM

@zhengruifeng zhengruifeng force-pushed the sql_func_math branch 2 times, most recently from f5028b9 to 200fe37 Compare June 7, 2023 23:56
@zhengruifeng
Copy link
Contributor Author

@HyukjinKwon it seems that sql - other start failing again ...

@zhengruifeng
Copy link
Contributor Author

on the sql side, this PR only touch MathFunctionsSuite in sql - slow

image

the failure in sql - other should be unrelated, i am going to merge it now

@zhengruifeng
Copy link
Contributor Author

merged to master

@LuciferYang
Copy link
Contributor

@HyukjinKwon it seems that sql - other start failing again ...

I will investigate tomorrow, a little late today

There is one case that may have been failed after this merged , and I try to fix it in #41519

zhengruifeng pushed a commit that referenced this pull request Jun 8, 2023
…ction parity` in `DataFrameFunctionsSuite`

### What changes were proposed in this pull request?
This pr remove `ceiling`, `negative`, `std`, `sign`  from `excludedSqlFunctions`  to make `DataFrame function and SQL functon parity` in `DataFrameFunctionsSuite ` test pass.  These four functions were introduced into `sql.functions` in #41435 and need to maintain the content of `excludedSqlFunctions` simultaneously.

### Why are the changes needed?
Fix `DataFrame function and SQL function parity` in `DataFrameFunctionsSuite`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Acitons
- Manual check, run

```
build/sbt clean "sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite"
```

**Before**

```
[info] DataFrameFunctionsSuite:
23:20:51.858 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] - DataFrame function and SQL functon parity *** FAILED *** (340 milliseconds)
[info]   Set("ceiling", "negative", "std", "sign") was not empty (DataFrameFunctionsSuite.scala:115)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.sql.DataFrameFunctionsSuite.$anonfun$new$1(DataFrameFunctionsSuite.scala:115)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:221)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:67)
[info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:67)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Suite.run(Suite.scala:1114)
[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
[info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:67)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:67)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)
...
[info] Run completed in 27 seconds, 818 milliseconds.
[info] Total number of tests run: 123
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 122, failed 1, canceled 0, ignored 0, pending 0
```

**After**

```
[info] Run completed in 27 seconds, 338 milliseconds.
[info] Total number of tests run: 123
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 123, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes #41519 from LuciferYang/fix-df-functions-suite.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
czxm pushed a commit to czxm/spark that referenced this pull request Jun 12, 2023
…nd Python

### What changes were proposed in this pull request?
Add following functions:

* ceiling
* e
* pi
* ln
* negative
* positive
* power
* sign
* std
* width_bucket

to:

* Scala API
* Python API
* Spark Connect Scala Client
* Spark Connect Python Client

This PR also adds `negate` (which already exists in Scala API and SCSC) to Python API and SCPC.

### Why are the changes needed?
for parity

### Does this PR introduce _any_ user-facing change?
yes, new functions

### How was this patch tested?
added ut / doctest

Closes apache#41435 from zhengruifeng/sql_func_math.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
czxm pushed a commit to czxm/spark that referenced this pull request Jun 12, 2023
…ction parity` in `DataFrameFunctionsSuite`

### What changes were proposed in this pull request?
This pr remove `ceiling`, `negative`, `std`, `sign`  from `excludedSqlFunctions`  to make `DataFrame function and SQL functon parity` in `DataFrameFunctionsSuite ` test pass.  These four functions were introduced into `sql.functions` in apache#41435 and need to maintain the content of `excludedSqlFunctions` simultaneously.

### Why are the changes needed?
Fix `DataFrame function and SQL function parity` in `DataFrameFunctionsSuite`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Acitons
- Manual check, run

```
build/sbt clean "sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite"
```

**Before**

```
[info] DataFrameFunctionsSuite:
23:20:51.858 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] - DataFrame function and SQL functon parity *** FAILED *** (340 milliseconds)
[info]   Set("ceiling", "negative", "std", "sign") was not empty (DataFrameFunctionsSuite.scala:115)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.sql.DataFrameFunctionsSuite.$anonfun$new$1(DataFrameFunctionsSuite.scala:115)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:221)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:67)
[info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:67)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Suite.run(Suite.scala:1114)
[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
[info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:67)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:67)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)
...
[info] Run completed in 27 seconds, 818 milliseconds.
[info] Total number of tests run: 123
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 122, failed 1, canceled 0, ignored 0, pending 0
```

**After**

```
[info] Run completed in 27 seconds, 338 milliseconds.
[info] Total number of tests run: 123
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 123, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes apache#41519 from LuciferYang/fix-df-functions-suite.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants