Skip to content
Permalink
Browse files

[SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrings to the top…

… in PySpark examples

## What changes were proposed in this pull request?

This PR proposes to relocate the docstrings in modules of examples to the top. Seems these are mistakes. So, for example, the below codes

```python
>>> help(aft_survival_regression)
```

shows the module docstrings for examples as below:

**Before**

```
Help on module aft_survival_regression:

NAME
    aft_survival_regression

...

DESCRIPTION
    # Licensed to the Apache Software Foundation (ASF) under one or more
    # contributor license agreements.  See the NOTICE file distributed with
    # this work for additional information regarding copyright ownership.
    # The ASF licenses this file to You under the Apache License, Version 2.0
    # (the "License"); you may not use this file except in compliance with
    # the License.  You may obtain a copy of the License at
    #
    #    http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    #

...

(END)
```

**After**

```
Help on module aft_survival_regression:

NAME
    aft_survival_regression

...

DESCRIPTION
    An example demonstrating aft survival regression.
    Run with:
      bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py

(END)
```

## How was this patch tested?

Manually checked.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #20416 from HyukjinKwon/module-docstring-example.

(cherry picked from commit b8c32dc)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
  • Loading branch information...
HyukjinKwon committed Jan 28, 2018
1 parent 3b6fc28 commit 8ff0cc48b1b45ed41914822ffaaf8de8dff87b72
@@ -15,13 +15,6 @@
# limitations under the License.
#

from __future__ import print_function

import sys

from functools import reduce
from pyspark.sql import SparkSession

"""
Read data file users.avro in local Spark distro:
@@ -50,6 +43,13 @@
{u'favorite_color': None, u'name': u'Alyssa'}
{u'favorite_color': u'red', u'name': u'Ben'}
"""
from __future__ import print_function

import sys

from functools import reduce
from pyspark.sql import SparkSession

if __name__ == "__main__":
if len(sys.argv) != 2 and len(sys.argv) != 3:
print("""
@@ -15,6 +15,11 @@
# limitations under the License.
#

"""
An example demonstrating aft survival regression.
Run with:
bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py
"""
from __future__ import print_function

# $example on$
@@ -23,12 +28,6 @@
# $example off$
from pyspark.sql import SparkSession

"""
An example demonstrating aft survival regression.
Run with:
bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py
"""

if __name__ == "__main__":
spark = SparkSession \
.builder \
@@ -15,19 +15,18 @@
# limitations under the License.
#

"""
An example demonstrating bisecting k-means clustering.
Run with:
bin/spark-submit examples/src/main/python/ml/bisecting_k_means_example.py
"""
from __future__ import print_function

# $example on$
from pyspark.ml.clustering import BisectingKMeans
# $example off$
from pyspark.sql import SparkSession

"""
An example demonstrating bisecting k-means clustering.
Run with:
bin/spark-submit examples/src/main/python/ml/bisecting_k_means_example.py
"""

if __name__ == "__main__":
spark = SparkSession\
.builder\
@@ -15,7 +15,11 @@
# limitations under the License.
#


"""
An example demonstrating BucketedRandomProjectionLSH.
Run with:
bin/spark-submit examples/src/main/python/ml/bucketed_random_projection_lsh_example.py
"""
from __future__ import print_function

# $example on$
@@ -25,12 +29,6 @@
# $example off$
from pyspark.sql import SparkSession

"""
An example demonstrating BucketedRandomProjectionLSH.
Run with:
bin/spark-submit examples/src/main/python/ml/bucketed_random_projection_lsh_example.py
"""

if __name__ == "__main__":
spark = SparkSession \
.builder \
@@ -15,6 +15,11 @@
# limitations under the License.
#

"""
An example for Chi-square hypothesis testing.
Run with:
bin/spark-submit examples/src/main/python/ml/chi_square_test_example.py
"""
from __future__ import print_function

from pyspark.sql import SparkSession
@@ -23,11 +28,6 @@
from pyspark.ml.stat import ChiSquareTest
# $example off$

"""
An example for Chi-square hypothesis testing.
Run with:
bin/spark-submit examples/src/main/python/ml/chi_square_test_example.py
"""
if __name__ == "__main__":
spark = SparkSession \
.builder \
@@ -15,6 +15,11 @@
# limitations under the License.
#

"""
An example for computing correlation matrix.
Run with:
bin/spark-submit examples/src/main/python/ml/correlation_example.py
"""
from __future__ import print_function

# $example on$
@@ -23,11 +28,6 @@
# $example off$
from pyspark.sql import SparkSession

"""
An example for computing correlation matrix.
Run with:
bin/spark-submit examples/src/main/python/ml/correlation_example.py
"""
if __name__ == "__main__":
spark = SparkSession \
.builder \
@@ -15,6 +15,13 @@
# limitations under the License.
#

"""
A simple example demonstrating model selection using CrossValidator.
This example also demonstrates how Pipelines are Estimators.
Run with:
bin/spark-submit examples/src/main/python/ml/cross_validator.py
"""
from __future__ import print_function

# $example on$
@@ -26,14 +33,6 @@
# $example off$
from pyspark.sql import SparkSession

"""
A simple example demonstrating model selection using CrossValidator.
This example also demonstrates how Pipelines are Estimators.
Run with:
bin/spark-submit examples/src/main/python/ml/cross_validator.py
"""

if __name__ == "__main__":
spark = SparkSession\
.builder\
@@ -15,16 +15,15 @@
# limitations under the License.
#

# $example on$
from pyspark.ml.fpm import FPGrowth
# $example off$
from pyspark.sql import SparkSession

"""
An example demonstrating FPGrowth.
Run with:
bin/spark-submit examples/src/main/python/ml/fpgrowth_example.py
"""
# $example on$
from pyspark.ml.fpm import FPGrowth
# $example off$
from pyspark.sql import SparkSession

if __name__ == "__main__":
spark = SparkSession\
@@ -15,19 +15,18 @@
# limitations under the License.
#

"""
A simple example demonstrating Gaussian Mixture Model (GMM).
Run with:
bin/spark-submit examples/src/main/python/ml/gaussian_mixture_example.py
"""
from __future__ import print_function

# $example on$
from pyspark.ml.clustering import GaussianMixture
# $example off$
from pyspark.sql import SparkSession

"""
A simple example demonstrating Gaussian Mixture Model (GMM).
Run with:
bin/spark-submit examples/src/main/python/ml/gaussian_mixture_example.py
"""

if __name__ == "__main__":
spark = SparkSession\
.builder\
@@ -15,19 +15,18 @@
# limitations under the License.
#

"""
An example demonstrating generalized linear regression.
Run with:
bin/spark-submit examples/src/main/python/ml/generalized_linear_regression_example.py
"""
from __future__ import print_function

from pyspark.sql import SparkSession
# $example on$
from pyspark.ml.regression import GeneralizedLinearRegression
# $example off$

"""
An example demonstrating generalized linear regression.
Run with:
bin/spark-submit examples/src/main/python/ml/generalized_linear_regression_example.py
"""

if __name__ == "__main__":
spark = SparkSession\
.builder\
@@ -15,16 +15,15 @@
# limitations under the License.
#

# $example on$
from pyspark.ml.feature import Imputer
# $example off$
from pyspark.sql import SparkSession

"""
An example demonstrating Imputer.
Run with:
bin/spark-submit examples/src/main/python/ml/imputer_example.py
"""
# $example on$
from pyspark.ml.feature import Imputer
# $example off$
from pyspark.sql import SparkSession

if __name__ == "__main__":
spark = SparkSession\
@@ -17,6 +17,9 @@

"""
Isotonic Regression Example.
Run with:
bin/spark-submit examples/src/main/python/ml/isotonic_regression_example.py
"""
from __future__ import print_function

@@ -25,12 +28,6 @@
# $example off$
from pyspark.sql import SparkSession

"""
An example demonstrating isotonic regression.
Run with:
bin/spark-submit examples/src/main/python/ml/isotonic_regression_example.py
"""

if __name__ == "__main__":
spark = SparkSession\
.builder\
@@ -15,6 +15,13 @@
# limitations under the License.
#

"""
An example demonstrating k-means clustering.
Run with:
bin/spark-submit examples/src/main/python/ml/kmeans_example.py
This example requires NumPy (http://www.numpy.org/).
"""
from __future__ import print_function

# $example on$
@@ -24,14 +31,6 @@

from pyspark.sql import SparkSession

"""
An example demonstrating k-means clustering.
Run with:
bin/spark-submit examples/src/main/python/ml/kmeans_example.py
This example requires NumPy (http://www.numpy.org/).
"""

if __name__ == "__main__":
spark = SparkSession\
.builder\
@@ -15,20 +15,18 @@
# limitations under the License.
#


"""
An example demonstrating LDA.
Run with:
bin/spark-submit examples/src/main/python/ml/lda_example.py
"""
from __future__ import print_function

# $example on$
from pyspark.ml.clustering import LDA
# $example off$
from pyspark.sql import SparkSession

"""
An example demonstrating LDA.
Run with:
bin/spark-submit examples/src/main/python/ml/lda_example.py
"""

if __name__ == "__main__":
spark = SparkSession \
.builder \
@@ -15,19 +15,18 @@
# limitations under the License.
#

"""
An example demonstrating Logistic Regression Summary.
Run with:
bin/spark-submit examples/src/main/python/ml/logistic_regression_summary_example.py
"""
from __future__ import print_function

# $example on$
from pyspark.ml.classification import LogisticRegression
# $example off$
from pyspark.sql import SparkSession

"""
An example demonstrating Logistic Regression Summary.
Run with:
bin/spark-submit examples/src/main/python/ml/logistic_regression_summary_example.py
"""

if __name__ == "__main__":
spark = SparkSession \
.builder \

0 comments on commit 8ff0cc4

Please sign in to comment.
You can’t perform that action at this time.