Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Nov 17, 2025

What changes were proposed in this pull request?

SPARK-54020 added some new test cases in PythonPipelineSuite. This pr incorporates assume(PythonTestDepsChecker.isConnectDepsAvailable) for these test cases to ensure that the tests are skipped rather than failing when PyConnect dependencies are missing.

Why are the changes needed?

Enhance the robustness of test cases. Prior to this, when executing build/sbt "connect/testOnly org.apache.spark.sql.connect.pipelines.PythonPipelineSuite":

[info] - reading internal datasets outside query function that trigger eager analysis or execution will fail (spark.sql("SELECT * FROM src")) *** FAILED *** (4 milliseconds)
[info]   "org.apache.spark.sql.connect.PythonTestDepsChecker.isConnectDepsAvailable was false" did not contain "TABLE_OR_VIEW_NOT_FOUND" (PythonPipelineSuite.scala:546)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.sql.connect.pipelines.PythonPipelineSuite.$anonfun$new$43(PythonPipelineSuite.scala:546)
[info]   at org.apache.spark.sql.connect.pipelines.PythonPipelineSuite.$anonfun$new$43$adapted(PythonPipelineSuite.scala:532)
[info]   at org.apache.spark.SparkFunSuite.$anonfun$gridTest$2(SparkFunSuite.scala:241)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
[info]   at org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
...
[info] *** 24 TESTS FAILED ***
[error] Failed tests:
[error] 	org.apache.spark.sql.connect.pipelines.PythonPipelineSuite
[error] (connect / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Pass Github Actions
  • Manually verify that the relevant tests will no longer fail when PyConnect dependencies are missing.

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang LuciferYang marked this pull request as draft November 17, 2025 03:47
@LuciferYang
Copy link
Contributor Author

Test first

@LuciferYang LuciferYang marked this pull request as ready for review November 17, 2025 04:52
LuciferYang added a commit that referenced this pull request Nov 17, 2025
…eSuite` to skip tests when PyConnect dependencies is not available

### What changes were proposed in this pull request?
SPARK-54020 added some new test cases in `PythonPipelineSuite`. This pr incorporates `assume(PythonTestDepsChecker.isConnectDepsAvailable)` for these test cases to ensure that the tests are skipped rather than failing when PyConnect dependencies are missing.

### Why are the changes needed?
Enhance the robustness of test cases. Prior to this, when executing `build/sbt "connect/testOnly org.apache.spark.sql.connect.pipelines.PythonPipelineSuite"`:

```
[info] - reading internal datasets outside query function that trigger eager analysis or execution will fail (spark.sql("SELECT * FROM src")) *** FAILED *** (4 milliseconds)
[info]   "org.apache.spark.sql.connect.PythonTestDepsChecker.isConnectDepsAvailable was false" did not contain "TABLE_OR_VIEW_NOT_FOUND" (PythonPipelineSuite.scala:546)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.sql.connect.pipelines.PythonPipelineSuite.$anonfun$new$43(PythonPipelineSuite.scala:546)
[info]   at org.apache.spark.sql.connect.pipelines.PythonPipelineSuite.$anonfun$new$43$adapted(PythonPipelineSuite.scala:532)
[info]   at org.apache.spark.SparkFunSuite.$anonfun$gridTest$2(SparkFunSuite.scala:241)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
[info]   at org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
[info]   at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
...
[info] *** 24 TESTS FAILED ***
[error] Failed tests:
[error] 	org.apache.spark.sql.connect.pipelines.PythonPipelineSuite
[error] (connect / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass Github Actions
- Manually verify that the relevant tests will no longer fail when PyConnect dependencies are missing.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53088 from LuciferYang/SPARK-54375.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 722bcc0)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@LuciferYang
Copy link
Contributor Author

Merged into master and branch-4.1. Thanks @zhengruifeng

@dongjoon-hyun
Copy link
Member

Thank you, @LuciferYang and @zhengruifeng !

cc @SCHJonathan and @sryza , too.

@sryza
Copy link
Contributor

sryza commented Nov 17, 2025

LGTM, though I wonder if there's some way to implement this that doesn't require remembering to add it to every single test?

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Nov 18, 2025

LGTM, though I wonder if there's some way to implement this that doesn't require remembering to add it to every single test?

@sryza We can override the function test(testName: String, testTags: Tag*)(testFun: => Any)(implicit pos: Position): Unit to perform a default check of PythonTestDepsChecker.isConnectDepsAvailable. Do all tests within the PythonPipelineSuite need to check for these Python module dependencies? Can we simply limit the scope of the overridden test method to the PythonPipelineSuite?

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Nov 18, 2025

LGTM, though I wonder if there's some way to implement this that doesn't require remembering to add it to every single test?

@sryza for this, I've submitted a followup pr: #53106, do you think this is acceptable? Or do you have any better suggestions for modifications?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants