Skip to content

Commit

Permalink
[SPARK-33003][PYTHON][DOCS] Add type hints guidelines to the document…
Browse files Browse the repository at this point in the history
…ation

### What changes were proposed in this pull request?

Add type hints guidelines to developer docs.

### Why are the changes needed?

Since it is a new and still somewhat evolving feature, we should provided clear guidelines for potential contributors.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Closes #30094 from zero323/SPARK-33003.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
  • Loading branch information
zero323 authored and HyukjinKwon committed Oct 24, 2020
1 parent 82d500a commit d7f15b0
Showing 1 changed file with 44 additions and 1 deletion.
45 changes: 44 additions & 1 deletion python/docs/source/development/contributing.rst
Expand Up @@ -77,6 +77,50 @@ There are a couple of additional notes to keep in mind when contributing to code
* Be Pythonic.
* APIs are matched with Scala and Java sides in general.
* PySpark specific APIs can still be considered as long as they are Pythonic and do not conflict with other existent APIs, for example, decorator usage of UDFs.
* If you extend or modify public API, please adjust corresponding type hints. See `Contributing and Maintaining Type Hints`_ for details.

Contributing and Maintaining Type Hints
----------------------------------------

PySpark type hints are provided using stub files, placed in the same directory as the annotated module, with exception to ``# type: ignore`` in modules which don't have their own stubs (tests, examples and non-public API).
As a rule of thumb, only public API is annotated.

Annotations should, when possible:

* Reflect expectations of the underlying JVM API, to help avoid type related failures outside Python interpreter.
* In case of conflict between too broad (``Any``) and too narrow argument annotations, prefer the latter as one, as long as it is covering most of the typical use cases.
* Indicate nonsensical combinations of arguments using ``@overload`` annotations. For example, to indicate that ``*Col`` and ``*Cols`` arguments are mutually exclusive:

.. code-block:: python
@overload
def __init__(
self,
*,
threshold: float = ...,
inputCol: Optional[str] = ...,
outputCol: Optional[str] = ...
) -> None: ...
@overload
def __init__(
self,
*,
thresholds: Optional[List[float]] = ...,
inputCols: Optional[List[str]] = ...,
outputCols: Optional[List[str]] = ...
) -> None: ...
* Be compatible with the current stable MyPy release.


Complex supporting type definitions, should be placed in dedicated ``_typing.pyi`` stubs. See for example `pyspark.sql._typing.pyi <https://github.com/apache/spark/blob/master/python/pyspark/sql/_typing.pyi>`_.

Annotations can be validated using ``dev/lint-python`` script or by invoking mypy directly:

.. code-block:: bash
mypy --config python/mypy.ini python/pyspark
Code Style Guide
Expand All @@ -90,4 +134,3 @@ the APIs were inspired by Java. PySpark also follows `camelCase` for exposed API
There is an exception ``functions.py`` that uses `snake_case`. It was in order to make APIs SQL (and Python) friendly.

PySpark leverages linters such as `pycodestyle <https://pycodestyle.pycqa.org/en/latest/>`_ and `flake8 <https://flake8.pycqa.org/en/latest/>`_, which ``dev/lint-python`` runs. Therefore, make sure to run that script to double check.

0 comments on commit d7f15b0

Please sign in to comment.