-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37342][BUILD] Upgrade Apache Arrow to 6.0.0 #34613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @BryanCutler FYI |
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @sunchao have you tried running PySpark tests with PyArrow 6.0.0?
|
Kubernetes integration test starting |
|
@BryanCutler eh I didn't - was hoping that the Spark CI would help with that. Do you know how I can test that? |
|
Oh, it should better be tested @sunchao. Running it regularly should work (https://spark.apache.org/developer-tools.html) but with |
|
Currently CI uses PyArrow < 5.0.0 (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L222) |
|
Kubernetes integration test status failure |
|
Thanks @HyukjinKwon , you mean run tests like: python/run-tests --testnames pyspark.sql.tests.test_arrowbut with Let me do it soon. |
|
oh you can do, instead: pip install -r dev/requirements.txt
pip install pyarrow==6.0.0
python/run-tests --modules pyspark-sqlthat would verify all the things 👍 |
|
Test build #145254 has finished for PR 34613 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM if @sunchao succeeded with @HyukjinKwon 's direction.
Thanks, @sunchao , @BryanCutler , @HyukjinKwon
|
Merged to master. |
|
Thanks for the merge @HyukjinKwon ! Sorry for the delay, I'm still trying to test this. I followed the steps you mentioned above: and also set Do you know what I could have missed here? |
|
Oh, |
|
the full command I use: ./build/mvn -DskipTests -Phive-2.3 -Phive clean package # can also be sbt build that's the same as GA build uses.
export SPARK_HOME=`pwd`
./python/run-tests --python-executables=python3 --modules pyspark-sql |
|
and .. if you're running on a Mac, should also set: export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES |
|
Hmm I must missed something since I always get this "ConnectionRefusedError: [Errno 61] Connection refused" even though I followed the exact above steps. Not sure what host & port it tries to access: I did open up passwordless ssh to localhost. |
|
@sunchao, how does it work if you do one of below:
? As you already assumed, seems like it fails to open a socket with localhost. |
### What changes were proposed in this pull request? Bump up Apache Arrow version to 6.0.0, from 2.0.0 ### Why are the changes needed? Apache Spark is still using a old Apache Arrow version 2.0.0, while 6.0.0 was released already at October, 2021. We should pick up improvements & bug fixes from the newer version. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes apache#34613 from sunchao/SPARK-37342. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Bump up Apache Arrow version to 6.0.0, from 2.0.0
Why are the changes needed?
Apache Spark is still using a old Apache Arrow version 2.0.0, while 6.0.0 was released already at October, 2021. We should pick up improvements & bug fixes from the newer version.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.