Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-32190][PYTHON][DOCS] Development - Contribution Guide in PySpark
### What changes were proposed in this pull request? This PR proposes to document PySpark specific contribution guides at "Development" section. Here is the demo for reviewing quicker: https://hyukjin-spark.readthedocs.io/en/stable/development/contributing.html ### Why are the changes needed? To have a single place for PySpark users, and better documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a new documentation. See the demo linked above. ### How was this patch tested? ```bash cd docs SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch ``` and ```bash cd python/docs make clean html ``` Closes #29596 from HyukjinKwon/SPARK-32190. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
- Loading branch information
1 parent
701e593
commit 86ca90c
Showing
2 changed files
with
98 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
.. Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
.. http://www.apache.org/licenses/LICENSE-2.0 | ||
.. Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
======================= | ||
Contributing to PySpark | ||
======================= | ||
|
||
There are many types of contribution, for example, helping other users, testing releases, reviewing changes, | ||
documentation contribution, bug reporting, JIRA maintenance, code changes, etc. | ||
These are documented at `the general guidelines <http://spark.apache.org/contributing.html>`_. | ||
This page focuses on PySpark and includes additional details specifically for PySpark. | ||
|
||
|
||
Contributing by Testing Releases | ||
-------------------------------- | ||
|
||
Before the official release, PySpark release candidates are shared in the `dev@spark.apache.org <http://apache-spark-developers-list.1001551.n3.nabble.com/>`_ mailing list to vote on. | ||
This release candidates can be easily installed via pip. For example, in case of Spark 3.0.0 RC1, you can install as below: | ||
|
||
.. code-block:: bash | ||
pip install https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz | ||
The link for release files such as ``https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin`` can be found in the vote thread. | ||
|
||
Testing and verifying users' existing workloads against release candidates is one of the vital contributions to PySpark. | ||
It prevents breaking users' existing workloads before the official release. | ||
When there is an issue such as a regression, correctness problem or performance degradation worth enough to drop the release candidate, | ||
usually the release candidate is dropped and the community focuses on fixing it to include in the next release candidate. | ||
|
||
|
||
Contributing Documentation Changes | ||
---------------------------------- | ||
|
||
The release documentation is located under Spark's `docs <https://github.com/apache/spark/tree/master/docs>`_ directory. | ||
`README.md <https://github.com/apache/spark/blob/master/docs/README.md>`_ describes the required dependencies and steps | ||
to generate the documentations. Usually, PySpark documentation is tested with the command below | ||
under the `docs <https://github.com/apache/spark/tree/master/docs>`_ directory: | ||
|
||
.. code-block:: bash | ||
SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch | ||
PySpark uses Sphinx to generate its release PySpark documentation. Therefore, if you want to build only PySpark documentation alone, | ||
you can build under `python/docs <https://github.com/apache/spark/tree/master/python>`_ directory by: | ||
|
||
.. code-block:: bash | ||
make html | ||
It generates the corresponding HTMLs under ``python/docs/build/html``. | ||
|
||
Lastly, please make sure that the new APIs are documented by manually adding methods and/or classes at the corresponding RST files | ||
under ``python/docs/source/reference``. Otherwise, they would not be documented in PySpark documentation. | ||
|
||
|
||
Preparing to Contribute Code Changes | ||
------------------------------------ | ||
|
||
Before starting to work on codes in PySpark, it is recommended to read `the general guidelines <http://spark.apache.org/contributing.html>`_. | ||
There are a couple of additional notes to keep in mind when contributing to codes in PySpark: | ||
|
||
* Be Pythonic. | ||
* APIs are matched with Scala and Java sides in general. | ||
* PySpark specific APIs can still be considered as long as they are Pythonic and do not conflict with other existent APIs, for example, decorator usage of UDFs. | ||
|
||
|
||
Code Style Guide | ||
---------------- | ||
|
||
Please follow the style of the existing codebase as is, which is virtually PEP 8 with one exception: lines can be up | ||
to 100 characters in length, not 79. | ||
|
||
Note that the method and variable names in PySpark are the similar case is ``threading`` library in Python itself where | ||
the APIs were inspired by Java. PySpark also follows `camelCase` for exposed APIs that match with Scala and Java. | ||
There is an exception ``functions.py`` that uses `snake_case`. It was in order to make APIs SQL (and Python) friendly. | ||
|
||
PySpark leverages linters such as `pycodestyle <https://pycodestyle.pycqa.org/en/latest/>`_ and `flake8 <https://flake8.pycqa.org/en/latest/>`_, which ``dev/lint-python`` runs. Therefore, make sure to run that script to double check. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,3 +19,8 @@ | |
Development | ||
=========== | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
contributing | ||
|