Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45822][CONNECT] SparkConnectSessionManager may look up a stopped sparkcontext #43701

Closed
wants to merge 2 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Nov 7, 2023

What changes were proposed in this pull request?

This PR checks whether the sc is still functional before cloning a new isolated session from it.

Why are the changes needed?

SparkSession.active is a thread-local value and not be updated by other thread.

This causes https://github.com/LuciferYang/spark/actions/runs/6767960232/job/18426049162

- ReleaseSession: session with different session_id or user_id allowed after release *** FAILED *** (9 milliseconds)
[info]   org.apache.spark.SparkException: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
[info] This stopped SparkContext was created at:
[info] 
[info] org.apache.spark.sql.connect.service.SparkConnectSessionHolderSuite.beforeAll(SparkConnectSessionHolderSuite.scala:37)
[info] org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)

For shared spark sessions in tests, these sessions are created, stopped, and retrieved in different threads.

Does this PR introduce any user-facing change?

no

How was this patch tested?

I ran build/sbt "connect/testOnly *SparkConnect*" locally and the test consistently failed w/o this patch. Otherwise, it passed.

Was this patch authored or co-authored using generative AI tooling?

no

@LuciferYang
Copy link
Contributor

LuciferYang commented Nov 7, 2023

cc @panbingkun he is also concerned about this issue

also cc @juliuszsompolski

@LuciferYang
Copy link
Contributor

If the pandas environment is available, run

build/sbt "connect/testOnly *SparkConnectSessionHolderSuite *SparkConnectServiceE2ESuite"

also can reproduce this issue locally w/o this patch

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

I have verified it locally and it is effective. Thank you very much @yaooqinn

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @yaooqinn !
I believe @grundprinzip was reporting hitting that to me, and it was on my list to look into.

@LuciferYang
Copy link
Contributor

Merged into master for Spark 4.0. Thanks @yaooqinn @grundprinzip @juliuszsompolski

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 7, 2023

Thank all!

@yaooqinn yaooqinn deleted the SPARK-45822 branch November 7, 2023 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants