Skip to content

Commit

Permalink
[SPARK-32190][PYTHON][DOCS] Development - Contribution Guide in PySpark
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR proposes to document PySpark specific contribution guides at "Development" section.

Here is the demo for reviewing quicker: https://hyukjin-spark.readthedocs.io/en/stable/development/contributing.html

### Why are the changes needed?

To have a single place for PySpark users, and better documentation.

### Does this PR introduce _any_ user-facing change?

Yes, it is a new documentation. See the demo linked above.

### How was this patch tested?

```bash
cd docs
SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch
```

and

```bash
cd python/docs
make clean html
```

Closes #29596 from HyukjinKwon/SPARK-32190.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
  • Loading branch information
HyukjinKwon committed Sep 1, 2020
1 parent 701e593 commit 86ca90c
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 0 deletions.
93 changes: 93 additions & 0 deletions python/docs/source/development/contributing.rst
@@ -0,0 +1,93 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
=======================
Contributing to PySpark
=======================

There are many types of contribution, for example, helping other users, testing releases, reviewing changes,
documentation contribution, bug reporting, JIRA maintenance, code changes, etc.
These are documented at `the general guidelines <http://spark.apache.org/contributing.html>`_.
This page focuses on PySpark and includes additional details specifically for PySpark.


Contributing by Testing Releases
--------------------------------

Before the official release, PySpark release candidates are shared in the `dev@spark.apache.org <http://apache-spark-developers-list.1001551.n3.nabble.com/>`_ mailing list to vote on.
This release candidates can be easily installed via pip. For example, in case of Spark 3.0.0 RC1, you can install as below:

.. code-block:: bash
pip install https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
The link for release files such as ``https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin`` can be found in the vote thread.

Testing and verifying users' existing workloads against release candidates is one of the vital contributions to PySpark.
It prevents breaking users' existing workloads before the official release.
When there is an issue such as a regression, correctness problem or performance degradation worth enough to drop the release candidate,
usually the release candidate is dropped and the community focuses on fixing it to include in the next release candidate.


Contributing Documentation Changes
----------------------------------

The release documentation is located under Spark's `docs <https://github.com/apache/spark/tree/master/docs>`_ directory.
`README.md <https://github.com/apache/spark/blob/master/docs/README.md>`_ describes the required dependencies and steps
to generate the documentations. Usually, PySpark documentation is tested with the command below
under the `docs <https://github.com/apache/spark/tree/master/docs>`_ directory:

.. code-block:: bash
SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch
PySpark uses Sphinx to generate its release PySpark documentation. Therefore, if you want to build only PySpark documentation alone,
you can build under `python/docs <https://github.com/apache/spark/tree/master/python>`_ directory by:

.. code-block:: bash
make html
It generates the corresponding HTMLs under ``python/docs/build/html``.

Lastly, please make sure that the new APIs are documented by manually adding methods and/or classes at the corresponding RST files
under ``python/docs/source/reference``. Otherwise, they would not be documented in PySpark documentation.


Preparing to Contribute Code Changes
------------------------------------

Before starting to work on codes in PySpark, it is recommended to read `the general guidelines <http://spark.apache.org/contributing.html>`_.
There are a couple of additional notes to keep in mind when contributing to codes in PySpark:

* Be Pythonic.
* APIs are matched with Scala and Java sides in general.
* PySpark specific APIs can still be considered as long as they are Pythonic and do not conflict with other existent APIs, for example, decorator usage of UDFs.


Code Style Guide
----------------

Please follow the style of the existing codebase as is, which is virtually PEP 8 with one exception: lines can be up
to 100 characters in length, not 79.

Note that the method and variable names in PySpark are the similar case is ``threading`` library in Python itself where
the APIs were inspired by Java. PySpark also follows `camelCase` for exposed APIs that match with Scala and Java.
There is an exception ``functions.py`` that uses `snake_case`. It was in order to make APIs SQL (and Python) friendly.

PySpark leverages linters such as `pycodestyle <https://pycodestyle.pycqa.org/en/latest/>`_ and `flake8 <https://flake8.pycqa.org/en/latest/>`_, which ``dev/lint-python`` runs. Therefore, make sure to run that script to double check.

5 changes: 5 additions & 0 deletions python/docs/source/development/index.rst
Expand Up @@ -19,3 +19,8 @@
Development
===========

.. toctree::
:maxdepth: 2

contributing

0 comments on commit 86ca90c

Please sign in to comment.