-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47683][PYTHON][BUILD] Decouple PySpark core API to pyspark.core package #45053
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d21d1e3
to
3716df3
Compare
1c3b61e
to
e6cd1b4
Compare
095528b
to
9ca2054
Compare
cc @zhengruifeng @grundprinzip @ueshin @hvanhovell @itholic @WeichenXu123 @mengxr @allisonwang-db @xinrong-meng @gatorsmile @cloud-fan This is ready for a look (before merging, should wait one more day for the SPIP to pass though) |
zhengruifeng
approved these changes
Apr 3, 2024
itholic
approved these changes
Apr 3, 2024
xinrong-meng
approved these changes
Apr 3, 2024
ueshin
reviewed
Apr 3, 2024
5500bd7
to
4919bea
Compare
I restored the references for our internal API. Explicitly private attributes starting |
Merged to master. |
HyukjinKwon
added a commit
that referenced
this pull request
May 2, 2024
…spark-connect` package ### What changes were proposed in this pull request? This PR is a followup of #45053 that includes `lib/py4j*zip` in the package. Currently it's being picked up by https://github.com/apache/spark/blob/master/python/MANIFEST.in#L26. For other files, we don't create `deps` directory in `setup.py` for `pyspark-connect` so they are not included. But `lib` is being included. ### Why are the changes needed? To exclude unrelated files. ### Does this PR introduce _any_ user-facing change? No, the main change has not been released out yet. ### How was this patch tested? Manually packaged, and checked the contents via `vi`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46331 from HyukjinKwon/SPARK-47683-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
JacobZheng0927
pushed a commit
to JacobZheng0927/spark
that referenced
this pull request
May 11, 2024
…spark-connect` package ### What changes were proposed in this pull request? This PR is a followup of apache#45053 that includes `lib/py4j*zip` in the package. Currently it's being picked up by https://github.com/apache/spark/blob/master/python/MANIFEST.in#L26. For other files, we don't create `deps` directory in `setup.py` for `pyspark-connect` so they are not included. But `lib` is being included. ### Why are the changes needed? To exclude unrelated files. ### Does this PR introduce _any_ user-facing change? No, the main change has not been released out yet. ### How was this patch tested? Manually packaged, and checked the contents via `vi`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46331 from HyukjinKwon/SPARK-47683-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR proposes to release a separate
pyspark-connect
package, see also SPIP: Pure Python Package in PyPI (Spark Connect).Today's PySpark package is roughly as follows:
There will be two packages available,
pyspark
andpyspark-connect
.pyspark
Same as today’s PySpark. But Core module is factored out to
pyspark.core.*
. User-facing interface stays the same atpyspark.*
.pyspark-connect
Package after excluding modules that do not support Spark Connect, also excluding jars, that are, ml without jars:
Why are the changes needed?
To provide a pure Python library that does not depend on JVM.
See also SPIP: Pure Python Package in PyPI (Spark Connect).
Does this PR introduce any user-facing change?
Yes, users can install pure Python library via
pip install pyspark-connect
.How was this patch tested?
Manually tested the basic set of tests.
./sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
They will be separated added, and set as a scheduled job in CI.
Was this patch authored or co-authored using generative AI tooling?
No.