[SPARK-50135][BUILD] Upgrade ZooKeeper to 3.9.3#48666
[SPARK-50135][BUILD] Upgrade ZooKeeper to 3.9.3#48666panbingkun wants to merge 1 commit intoapache:masterfrom
Conversation
|
It looks good to me, @panbingkun . Is the PR ready? |
Yep,thanks. ❤️ |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @panbingkun .
Merged to master.
|
Surprisingly this caused the test failures in Spark Connect specifically for Mac .... With this PR: Without this PR: ... Let me revert this for now because the change is sort of trivial but seems like this affects all developers with Mac .. |
|
thanks @HyukjinKwon for addressing this issue. Python developers had been blocked for 2 weeks. IIRC, this is not the first time that some Python tests only fails with MacOS. Is it possible to add a lightweight MacOS GA job to guard basic PySpark functionality? @dongjoon-hyun @Yikun @LuciferYang |
|
We have MacOS test at https://github.com/apache/spark/actions/workflows/build_maven_java21_macos15.yml but we're not running PySpark tests now. We could improve this furhter. |
Thank you for helping to fix it. Let me investigate it. Thanks! |
|
Thank you for the head-up and recovery. |
|
Are all current PySpark tests run in a container environment? Even if the os is specified as mac-os, are the existing images still based on Ubuntu? Or is this Python-Only on macOS joob supposed to run in a physical machine environment? |
|
I think I have identified this issue and will submit a new PR this afternoon to solve it. |
yes, the are always container job. |
(pyspark) ➜ jars git:(master) ✗ ls -1 netty-*4*
netty-all-4.1.110.Final.jar
netty-buffer-4.1.113.Final.jar
netty-codec-4.1.113.Final.jar
netty-codec-http-4.1.110.Final.jar
netty-codec-http2-4.1.110.Final.jar
netty-codec-socks-4.1.110.Final.jar
netty-common-4.1.113.Final.jar
netty-handler-4.1.113.Final.jar
netty-handler-proxy-4.1.110.Final.jar
netty-resolver-4.1.113.Final.jar
netty-tcnative-boringssl-static-2.0.66.Final-linux-aarch_64.jar
netty-tcnative-boringssl-static-2.0.66.Final-linux-x86_64.jar
netty-tcnative-boringssl-static-2.0.66.Final-osx-aarch_64.jar
netty-tcnative-boringssl-static-2.0.66.Final-osx-x86_64.jar
netty-tcnative-boringssl-static-2.0.66.Final-windows-x86_64.jar
netty-transport-4.1.113.Final.jar
netty-transport-classes-epoll-4.1.113.Final.jar
netty-transport-classes-kqueue-4.1.110.Final.jar
netty-transport-native-epoll-4.1.113.Final-linux-aarch_64.jar
netty-transport-native-epoll-4.1.113.Final-linux-riscv64.jar
netty-transport-native-epoll-4.1.113.Final-linux-x86_64.jar
netty-transport-native-kqueue-4.1.110.Final-osx-aarch_64.jar
netty-transport-native-kqueue-4.1.110.Final-osx-x86_64.jar
netty-transport-native-unix-common-4.1.113.Final.jarand ======================================================================
ERROR [0.002s]: test_to_pandas (pyspark.sql.tests.connect.test_connect_collection.SparkConnectCollectionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/panbingkun/Developer/spark/spark-community/python/pyspark/sql/tests/connect/test_connect_collection.py", line 108, in test_to_pandas
self.connect.sql(query).toPandas(),
File "/Users/panbingkun/Developer/spark/spark-community/python/pyspark/sql/connect/session.py", line 753, in sql
data, properties, ei = self.client.execute_command(cmd.command(self._client))
File "/Users/panbingkun/Developer/spark/spark-community/python/pyspark/sql/connect/client/core.py", line 1109, in execute_command
data, _, metrics, observed_metrics, properties = self._execute_and_fetch(
File "/Users/panbingkun/Developer/spark/spark-community/python/pyspark/sql/connect/client/core.py", line 1517, in _execute_and_fetch
for response in self._execute_and_fetch_as_iterator(
File "/Users/panbingkun/Developer/spark/spark-community/python/pyspark/sql/connect/client/core.py", line 1494, in _execute_and_fetch_as_iterator
self._handle_error(error)
File "/Users/panbingkun/Developer/spark/spark-community/python/pyspark/sql/connect/client/core.py", line 1764, in _handle_error
self._handle_rpc_error(error)
File "/Users/panbingkun/Developer/spark/spark-community/python/pyspark/sql/connect/client/core.py", line 1849, in _handle_rpc_error
raise SparkConnectGrpcException(str(rpc_error)) from None
pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Encountered end-of-stream mid-frame"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-11-06T16:13:26.71221+08:00", grpc_status:13, grpc_message:"Encountered end-of-stream mid-frame"}"
>
----------------------------------------------------------------------
Ran 9 tests in 12.791s
FAILED (errors=5)
Generating XML reports...
Generated XML report: target/test-reports/TEST-pyspark.sql.tests.connect.test_connect_collection.SparkConnectCollectionTests-20241106161316.xml
Generated XML report: target/test-reports/TEST-pyspark.sql.tests.connect.test_connect_basic.SparkConnectSQLTestCase-20241106161316.xml
Had test failures in pyspark.sql.tests.connect.test_connect_collection with python3; see logs.
(pyspark) ➜ jars git:(master) ✗ ls -1 netty-*4*
netty-all-4.1.110.Final.jar
netty-buffer-4.1.110.Final.jar
netty-codec-4.1.110.Final.jar
netty-codec-http-4.1.110.Final.jar
netty-codec-http2-4.1.110.Final.jar
netty-codec-socks-4.1.110.Final.jar
netty-common-4.1.110.Final.jar
netty-handler-4.1.110.Final.jar
netty-handler-proxy-4.1.110.Final.jar
netty-resolver-4.1.110.Final.jar
netty-tcnative-boringssl-static-2.0.66.Final-linux-aarch_64.jar
netty-tcnative-boringssl-static-2.0.66.Final-linux-x86_64.jar
netty-tcnative-boringssl-static-2.0.66.Final-osx-aarch_64.jar
netty-tcnative-boringssl-static-2.0.66.Final-osx-x86_64.jar
netty-tcnative-boringssl-static-2.0.66.Final-windows-x86_64.jar
netty-transport-4.1.110.Final.jar
netty-transport-classes-epoll-4.1.110.Final.jar
netty-transport-classes-kqueue-4.1.110.Final.jar
netty-transport-native-epoll-4.1.110.Final-linux-aarch_64.jar
netty-transport-native-epoll-4.1.110.Final-linux-riscv64.jar
netty-transport-native-epoll-4.1.110.Final-linux-x86_64.jar
netty-transport-native-kqueue-4.1.110.Final-osx-aarch_64.jar
netty-transport-native-kqueue-4.1.110.Final-osx-x86_64.jar
netty-transport-native-unix-common-4.1.110.Final.jarAnd (pyspark) ➜ spark-community git:(master) ✗ ./python/run-tests --python-executables=python3 --testnames "pyspark.sql.tests.connect.test_connect_collection"
Running PySpark tests. Output is in /Users/panbingkun/Developer/spark/spark-community/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: ['pyspark.sql.tests.connect.test_connect_collection']
python3 python_implementation is CPython
python3 version is: Python 3.9.19
Starting test(python3): pyspark.sql.tests.connect.test_connect_collection (temp output: /Users/panbingkun/Developer/spark/spark-community/python/target/f5aa47b5-2b52-4106-a997-30e54b1316e6/python3__pyspark.sql.tests.connect.test_connect_collection__5e2svetz.log)
Finished test(python3): pyspark.sql.tests.connect.test_connect_collection (15s)
Tests passed in 15 seconds
|
|
How was '4.1.113' introduced? |
Here, what is excluded is the dependency on netty 3.x. The groupId for netty 3.x and 4.x is different. |
Based on the above conclusions and analysis, I believe that any compilation based on |
|
A new pr for it: #48771 |
|
we should change to exclude from zookeeper instead of
|
It doesn't seem to work, I've tried it and I can confirm again. |
|
The final solution is to upgrade |
### What changes were proposed in this pull request? The pr aims to upgrade `ZooKeeper` from `3.9.2` to `3.9.3`. This PR is to fix potential issues with PR #48666 ### Why are the changes needed? The full release notes: https://zookeeper.apache.org/doc/r3.9.3/releasenotes.html ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually check ```shell ./build/sbt -Phadoop-3 -Pkubernetes -Pkinesis-asl -Phive-thriftserver -Pdocker-integration-tests -Pyarn -Phadoop-cloud -Pspark-ganglia-lgpl -Phive -Pjvm-profiler clean package [info] Note: Some input files use or override a deprecated API. [info] Note: Recompile with -Xlint:deprecation for details. [warn] multiple main classes detected: run 'show discoveredMainClasses' to see the list [success] Total time: 272 s (04:32), completed Nov 6, 2024, 4:29:52 PM ``` ```shell (pyspark) ➜ spark-community git:(SPARK-50135_FOLLOWUP) ✗ ./python/run-tests --python-executables=python3 --testnames "pyspark.sql.tests.connect.test_connect_collection" Running PySpark tests. Output is in /Users/panbingkun/Developer/spark/spark-community/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the following Python tests: ['pyspark.sql.tests.connect.test_connect_collection'] python3 python_implementation is CPython python3 version is: Python 3.9.19 Starting test(python3): pyspark.sql.tests.connect.test_connect_collection (temp output: /Users/panbingkun/Developer/spark/spark-community/python/target/097bd7e0-9311-4484-ae2d-c0f4c63fc6f9/python3__pyspark.sql.tests.connect.test_connect_collection__8dzaeio9.log) Finished test(python3): pyspark.sql.tests.connect.test_connect_collection (14s) Tests passed in 14 seconds ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48771 from panbingkun/SPARK-50135_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>



What changes were proposed in this pull request?
The pr aims to upgrade
ZooKeeperfrom3.9.2to3.9.3.Why are the changes needed?
The full release notes: https://zookeeper.apache.org/doc/r3.9.3/releasenotes.html
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass GA.
Was this patch authored or co-authored using generative AI tooling?
No.