Skip to content

Conversation

@vinodkc
Copy link
Contributor

@vinodkc vinodkc commented Nov 26, 2025

What changes were proposed in this pull request?

This PR fixes flaky test failures in SparkConnectJdbcDataTypeSuite by adding defensive exception handling in SparkConnectStatement.close(). It also fixes a bug where the statement was not properly marked as closed.

Why are the changes needed?

The test get binary type (and potentially others) was failing ~27% of the time with:

org.apache.spark.SparkException: io.grpc.StatusRuntimeException: UNAVAILABLE: io exceptionCaused by: 
java.net.ConnectException: Connection refused  at SparkConnectStatement.close(SparkConnectStatement.scala:38)

Root Cause: During statement cleanup, close() calls interruptOperation() which makes a gRPC network call.

This fails when:

  • The operation has already completed successfully
  • The server is under load or temporarily unavailable
  • Network timing issues during cleanup
  • The test itself passes - the failure occurs only in the cleanup phase, causing spurious test failures.

Does this PR introduce any user-facing change?

No. This only affects internal test reliability.

How was this patch tested?

Verified the fix follows the established pattern in SparkSession.close() (lines 679-690)
The existing test suite validates correct behavior
The exception handling only affects cleanup, not core functionality

Was this patch authored or co-authored using generative AI tooling?

No

@vinodkc vinodkc changed the title [SPARK-53934][SQL][FOLLOW] Close Spark connect client connection and mark as closed [SPARK-53934][SQL][FOLLOWUP] Close Spark connect client connection and mark as closed Nov 26, 2025
@vinodkc
Copy link
Contributor Author

vinodkc commented Nov 26, 2025

@dongjoon-hyun , Please review

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pan3793 , @LuciferYang , too.

try {
conn.spark.interruptOperation(operationId)
} catch {
case _: Exception =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be more specific exception, @vinodkc ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we can match the exceptions explicitly. For example,

java.net.ConnectException

Copy link
Member

@pan3793 pan3793 Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use the utility method Utils.closeQuietly?

nvm, this is fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use the utility method Utils.closeQuietly?

+1

resultSet = null
}
closed = false
closed = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, this is an obvious bug.

@cloud-fan
Copy link
Contributor

The server is under load or temporarily unavailable

Does #53233 help here?

@cloud-fan
Copy link
Contributor

The operation has already completed successfully

We should fully understand the issue. Do you know where we fail to cancel an operation when the operation is already done? What kind of exception we throw?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants