Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31739][PYSPARK][DOCS][MINOR] Fix docstring syntax issues and misplaced space characters. #28559

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/pyspark/ml/clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -802,7 +802,7 @@ def computeCost(self, dataset):
Computes the sum of squared distances between the input points
and their corresponding cluster centers.

..note:: Deprecated in 3.0.0. It will be removed in future versions. Use
.. note:: Deprecated in 3.0.0. It will be removed in future versions. Use
ClusteringEvaluator instead. You can also get the cost on the training dataset in the
summary.
"""
Expand Down
1 change: 1 addition & 0 deletions python/pyspark/ml/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -563,6 +563,7 @@ def loadParamsInstance(path, sc):
class HasTrainingSummary(object):
"""
Base class for models that provides Training summary.

.. versionadded:: 3.0.0
"""

Expand Down
4 changes: 2 additions & 2 deletions python/pyspark/mllib/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ def save(self, sc, path):
* human-readable (JSON) model metadata to path/metadata/
* Parquet formatted data to path/data/

The model may be loaded using py:meth:`Loader.load`.
The model may be loaded using :py:meth:`Loader.load`.

:param sc: Spark context used to save model data.
:param path: Path specifying the directory in which to save
Expand Down Expand Up @@ -412,7 +412,7 @@ class Loader(object):
def load(cls, sc, path):
"""
Load a model from the given path. The model should have been
saved using py:meth:`Saveable.save`.
saved using :py:meth:`Saveable.save`.

:param sc: Spark context used for loading model files.
:param path: Path specifying the directory to which the model
Expand Down
6 changes: 3 additions & 3 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2138,7 +2138,7 @@ def drop(self, *cols):

@ignore_unicode_prefix
def toDF(self, *cols):
"""Returns a new class:`DataFrame` that with new specified column names
"""Returns a new :class:`DataFrame` that with new specified column names

:param cols: list of new column names (string)

Expand All @@ -2150,9 +2150,9 @@ def toDF(self, *cols):

@since(3.0)
def transform(self, func):
"""Returns a new class:`DataFrame`. Concise syntax for chaining custom transformations.
"""Returns a new :class:`DataFrame`. Concise syntax for chaining custom transformations.

:param func: a function that takes and returns a class:`DataFrame`.
:param func: a function that takes and returns a :class:`DataFrame`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix classification.py and regression.py, too?

pyspark/ml/classification.py:    To be mixed in with class:`pyspark.ml.JavaModel`
pyspark/ml/regression.py:    To be mixed in with class:`pyspark.ml.JavaModel`


>>> from pyspark.sql.functions import col
>>> df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"])
Expand Down
14 changes: 7 additions & 7 deletions python/pyspark/sql/readwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,15 +223,15 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
:param mode: allows a mode for dealing with corrupt records during parsing. If None is
set, it uses the default value, ``PERMISSIVE``.

* ``PERMISSIVE`` : when it meets a corrupted record, puts the malformed string \
* ``PERMISSIVE``: when it meets a corrupted record, puts the malformed string \
into a field configured by ``columnNameOfCorruptRecord``, and sets malformed \
fields to ``null``. To keep corrupt records, an user can set a string type \
field named ``columnNameOfCorruptRecord`` in an user-defined schema. If a \
schema does not have the field, it drops corrupt records during parsing. \
When inferring a schema, it implicitly adds a ``columnNameOfCorruptRecord`` \
field in an output schema.
* ``DROPMALFORMED`` : ignores the whole corrupted records.
* ``FAILFAST`` : throws an exception when it meets corrupted records.
* ``DROPMALFORMED``: ignores the whole corrupted records.
* ``FAILFAST``: throws an exception when it meets corrupted records.

:param columnNameOfCorruptRecord: allows renaming the new field having malformed string
created by ``PERMISSIVE`` mode. This overrides
Expand Down Expand Up @@ -470,7 +470,7 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
be controlled by ``spark.sql.csv.parser.columnPruning.enabled``
(enabled by default).

* ``PERMISSIVE`` : when it meets a corrupted record, puts the malformed string \
* ``PERMISSIVE``: when it meets a corrupted record, puts the malformed string \
into a field configured by ``columnNameOfCorruptRecord``, and sets malformed \
fields to ``null``. To keep corrupt records, an user can set a string type \
field named ``columnNameOfCorruptRecord`` in an user-defined schema. If a \
Expand All @@ -479,8 +479,8 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
When it meets a record having fewer tokens than the length of the schema, \
sets ``null`` to extra fields. When the record has more tokens than the \
length of the schema, it drops extra tokens.
* ``DROPMALFORMED`` : ignores the whole corrupted records.
* ``FAILFAST`` : throws an exception when it meets corrupted records.
* ``DROPMALFORMED``: ignores the whole corrupted records.
* ``FAILFAST``: throws an exception when it meets corrupted records.

:param columnNameOfCorruptRecord: allows renaming the new field having malformed string
created by ``PERMISSIVE`` mode. This overrides
Expand Down Expand Up @@ -830,7 +830,7 @@ def save(self, path=None, format=None, mode=None, partitionBy=None, **options):
def insertInto(self, tableName, overwrite=None):
"""Inserts the content of the :class:`DataFrame` to the specified table.

It requires that the schema of the class:`DataFrame` is the same as the
It requires that the schema of the :class:`DataFrame` is the same as the
schema of the table.

Optionally overwriting any existing data.
Expand Down
26 changes: 13 additions & 13 deletions python/pyspark/sql/streaming.py
Original file line number Diff line number Diff line change
Expand Up @@ -461,15 +461,15 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
:param mode: allows a mode for dealing with corrupt records during parsing. If None is
set, it uses the default value, ``PERMISSIVE``.

* ``PERMISSIVE`` : when it meets a corrupted record, puts the malformed string \
* ``PERMISSIVE``: when it meets a corrupted record, puts the malformed string \
into a field configured by ``columnNameOfCorruptRecord``, and sets malformed \
fields to ``null``. To keep corrupt records, an user can set a string type \
field named ``columnNameOfCorruptRecord`` in an user-defined schema. If a \
schema does not have the field, it drops corrupt records during parsing. \
When inferring a schema, it implicitly adds a ``columnNameOfCorruptRecord`` \
field in an output schema.
* ``DROPMALFORMED`` : ignores the whole corrupted records.
* ``FAILFAST`` : throws an exception when it meets corrupted records.
* ``DROPMALFORMED``: ignores the whole corrupted records.
* ``FAILFAST``: throws an exception when it meets corrupted records.

:param columnNameOfCorruptRecord: allows renaming the new field having malformed string
created by ``PERMISSIVE`` mode. This overrides
Expand Down Expand Up @@ -707,7 +707,7 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
:param mode: allows a mode for dealing with corrupt records during parsing. If None is
set, it uses the default value, ``PERMISSIVE``.

* ``PERMISSIVE`` : when it meets a corrupted record, puts the malformed string \
* ``PERMISSIVE``: when it meets a corrupted record, puts the malformed string \
into a field configured by ``columnNameOfCorruptRecord``, and sets malformed \
fields to ``null``. To keep corrupt records, an user can set a string type \
field named ``columnNameOfCorruptRecord`` in an user-defined schema. If a \
Expand All @@ -716,8 +716,8 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
When it meets a record having fewer tokens than the length of the schema, \
sets ``null`` to extra fields. When the record has more tokens than the \
length of the schema, it drops extra tokens.
* ``DROPMALFORMED`` : ignores the whole corrupted records.
* ``FAILFAST`` : throws an exception when it meets corrupted records.
* ``DROPMALFORMED``: ignores the whole corrupted records.
* ``FAILFAST``: throws an exception when it meets corrupted records.

:param columnNameOfCorruptRecord: allows renaming the new field having malformed string
created by ``PERMISSIVE`` mode. This overrides
Expand Down Expand Up @@ -795,11 +795,11 @@ def outputMode(self, outputMode):

Options include:

* `append`:Only the new rows in the streaming DataFrame/Dataset will be written to
* `append`: Only the new rows in the streaming DataFrame/Dataset will be written to
the sink
* `complete`:All the rows in the streaming DataFrame/Dataset will be written to the sink
* `complete`: All the rows in the streaming DataFrame/Dataset will be written to the sink
every time these is some updates
* `update`:only the rows that were updated in the streaming DataFrame/Dataset will be
* `update`: only the rows that were updated in the streaming DataFrame/Dataset will be
written to the sink every time there are some updates. If the query doesn't contain
aggregations, it will be equivalent to `append` mode.

Expand Down Expand Up @@ -1170,11 +1170,11 @@ def start(self, path=None, format=None, outputMode=None, partitionBy=None, query
:param outputMode: specifies how data of a streaming DataFrame/Dataset is written to a
streaming sink.

* `append`:Only the new rows in the streaming DataFrame/Dataset will be written to the
* `append`: Only the new rows in the streaming DataFrame/Dataset will be written to the
sink
* `complete`:All the rows in the streaming DataFrame/Dataset will be written to the sink
every time these is some updates
* `update`:only the rows that were updated in the streaming DataFrame/Dataset will be
* `complete`: All the rows in the streaming DataFrame/Dataset will be written to the
sink every time these is some updates
* `update`: only the rows that were updated in the streaming DataFrame/Dataset will be
written to the sink every time there are some updates. If the query doesn't contain
aggregations, it will be equivalent to `append` mode.
:param partitionBy: names of partitioning columns
Expand Down