Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46873][SS] Do not recreate new StreamingQueryManager for the same Spark Session #44898

Closed
wants to merge 8 commits into from

Conversation

WweiL
Copy link
Contributor

@WweiL WweiL commented Jan 26, 2024

What changes were proposed in this pull request?

In Scala, there is only one streaming query manager for one spark session:

scala> spark.streams
val res0: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager@46bb8cba

scala> spark.streams
val res1: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager@46bb8cba

scala> spark.streams
val res2: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager@46bb8cba

scala> spark.streams
val res3: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager@46bb8cba

In Python, this is currently false for both connect and vanilla spark:

>>> spark.streams
<pyspark.sql.connect.streaming.query.StreamingQueryManager object at 0x1011f7c10>
>>> spark.streams
<pyspark.sql.connect.streaming.query.StreamingQueryManager object at 0x1011f71f0>
>>> spark.streams
<pyspark.sql.connect.streaming.query.StreamingQueryManager object at 0x1011f7be0>
>>> spark.streams
<pyspark.sql.connect.streaming.query.StreamingQueryManager object at 0x1011f7c40>

This PR makes the spark session reuse existing streaming query manager

Why are the changes needed?

Python should align Scala behavior. 

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit test

Was this patch authored or co-authored using generative AI tooling?

No

@WweiL
Copy link
Contributor Author

WweiL commented Jan 26, 2024

my local env failed to build.. let's wait if CI passes

WweiL and others added 2 commits January 26, 2024 00:58
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
@WweiL
Copy link
Contributor Author

WweiL commented Jan 28, 2024

Hi Hyukjin @HyukjinKwon, I think this is ready to be merged! : )

@HyukjinKwon
Copy link
Member

Merged to master.

WweiL added a commit to WweiL/oss-spark that referenced this pull request May 2, 2024
…ame Spark Session

In Scala, there is only one streaming query manager for one spark session:

```
scala> spark.streams
val res0: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba

scala> spark.streams
val res1: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba

scala> spark.streams
val res2: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba

scala> spark.streams
val res3: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba
```

In Python, this is currently false for both connect and vanilla spark:

```
>>> spark.streams
<pyspark.sql.connect.streaming.query.StreamingQueryManager object at 0x1011f7c10>
>>> spark.streams
<pyspark.sql.connect.streaming.query.StreamingQueryManager object at 0x1011f71f0>
>>> spark.streams
<pyspark.sql.connect.streaming.query.StreamingQueryManager object at 0x1011f7be0>
>>> spark.streams
<pyspark.sql.connect.streaming.query.StreamingQueryManager object at 0x1011f7c40>
```
This PR makes the spark session reuse existing streaming query manager

Python should align Scala behavior. 

No

Added unit test

No

Closes apache#44898 from WweiL/SPARK-46873-sqm-reuse.

Authored-by: Wei Liu <wei.liu@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants