Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file #44920

Closed
wants to merge 11 commits into from
9 changes: 7 additions & 2 deletions python/MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,18 @@
# See the License for the specific language governing permissions and
# limitations under the License.

global-exclude *.py[cod] __pycache__ .DS_Store
# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html

graft pyspark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nchammas Seems like this ending up with adding all tests as well. Could we just include that json file alone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, graft pulls everything.

We can try to just include what we think we need, but it's probably safer (and easier) in the long run to instead exclude what we don't want to package, like tests. Would that work for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's safer so we don't miss something out ... but let's just add json file alone .. I think it's more import to get rid of unrelated files ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Are you planning to address this in #46331 (or some other PR), or would you like me to take care of it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate if you make another PR :-)

recursive-include deps/jars *.jar
graft deps/bin
recursive-include deps/sbin spark-config.sh spark-daemon.sh start-history-server.sh stop-history-server.sh
recursive-include deps/data *.data *.txt
recursive-include deps/licenses *.txt
recursive-include deps/examples *.py
recursive-include lib *.zip
recursive-include pyspark *.pyi py.typed
include README.md

# Note that these commands are processed in the order they appear, so keep
# this exclude at the end.
global-exclude *.py[cod] __pycache__ .DS_Store
4 changes: 2 additions & 2 deletions python/docs/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,15 +145,15 @@ PySpark is included in the distributions available at the `Apache Spark website
You can download a distribution you want from the site. After that, uncompress the tar file into the directory where you want
to install Spark, for example, as below:

.. parsed-literal::
.. code-block:: bash

tar xzvf spark-\ |release|\-bin-hadoop3.tgz

Ensure the ``SPARK_HOME`` environment variable points to the directory where the tar file has been extracted.
Update ``PYTHONPATH`` environment variable such that it can find the PySpark and Py4J under ``SPARK_HOME/python/lib``.
One example of doing this is shown below:

.. parsed-literal::
.. code-block:: bash

cd spark-\ |release|\-bin-hadoop3
export SPARK_HOME=`pwd`
Expand Down