Skip to content

[SPARK-45798][CONNECT] Assert server-side session ID#43664

Closed
grundprinzip wants to merge 18 commits intoapache:masterfrom
grundprinzip:SPARK-45798
Closed

[SPARK-45798][CONNECT] Assert server-side session ID#43664
grundprinzip wants to merge 18 commits intoapache:masterfrom
grundprinzip:SPARK-45798

Conversation

@grundprinzip
Copy link
Contributor

What changes were proposed in this pull request?

Without this patch, when the server would restart because of an abnormal condition, the client would not realize that this be the case. For example, when a driver OOM occurs and the driver is restarted, the client would not realize that the server is restarted and a new session is assigned.

This patch fixes this behavior and asserts that the server side session ID does not change during the connection. If it does change it throws an exception like this:

>>> spark.range(10).collect()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/martin.grund/Development/spark/python/pyspark/sql/connect/dataframe.py", line 1710, in collect
    table, schema = self._to_table()
  File "/Users/martin.grund/Development/spark/python/pyspark/sql/connect/dataframe.py", line 1722, in _to_table
    table, schema = self._session.client.to_table(query, self._plan.observations)
  File "/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py", line 839, in to_table
    table, schema, _, _, _ = self._execute_and_fetch(req, observations)
  File "/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py", line 1295, in _execute_and_fetch
    for response in self._execute_and_fetch_as_iterator(req, observations):
  File "/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py", line 1273, in _execute_and_fetch_as_iterator
    self._handle_error(error)
  File "/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py", line 1521, in _handle_error
    raise error
  File "/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py", line 1266, in _execute_and_fetch_as_iterator
    yield from handle_response(b)
  File "/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py", line 1193, in handle_response
    self._verify_response_integrity(b)
  File "/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py", line 1622, in _verify_response_integrity
    raise SparkConnectException(
pyspark.errors.exceptions.connect.SparkConnectException: Received incorrect server side session identifier for request. Please restart Spark Session. (9493c83d-cfa4-488f-9522-838ef3df90bf != c5302e8f-170d-477e-908d-299927b68fd8)

Why are the changes needed?

Stability

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Existing tests cover the basic invariant.
  • Added new tests.

Was this patch authored or co-authored using generative AI tooling?

No

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-45798] Assert server-side session ID [SPARK-45798][CONNECT] Assert server-side session ID Nov 5, 2023
Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Would be good if someone knowing python looked at it (@HyukjinKwon ?) but it looks good to me.

@juliuszsompolski
Copy link
Contributor

https://github.com/grundprinzip/spark/actions/runs/6796293583/job/18489636721 looks like it flaked out after already running all connect tests successfuly.

@HyukjinKwon
Copy link
Member

Merged to master.

HyukjinKwon pushed a commit that referenced this pull request Dec 13, 2023
…Info

### What changes were proposed in this pull request?

Small followup to #43664 - add serverSessionId to SessionHolderInfo.

### Why are the changes needed?

SessionHolderInfo should contain this kind of information about the session.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

NA

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44334 from juliuszsompolski/SPARK-45798-fup.

Authored-by: Juliusz Sompolski <julek@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments