[SPARK-15031][SPARK-15134][EXAMPLE][DOC] Use SparkSession and update indent in examples #13050

zhengruifeng · 2016-05-11T13:32:25Z

What changes were proposed in this pull request?

1, Use SparkSession according to SPARK-15031
2, Update indent for SparkContext according to SPARK-15134
3, BTW, remove some duplicate space and add missing '.'

How was this patch tested?

manual tests

zhengruifeng · 2016-05-11T13:33:40Z

cc @andrewor14

SparkQA · 2016-05-11T13:45:33Z

Test build #58370 has finished for PR 13050 at commit 3cbca5b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-05-11T14:46:51Z

examples/src/main/python/ml/simple_params_example.py

-        exit(1)
-    sc = SparkContext(appName="PythonSimpleParamsExample")
-    sqlContext = SQLContext(sc)
+    spark = SparkSession \


I wonder if this example works. I remember this was not fixed in #12809 because it does not work.

You are right. model1.extractParamMap() and model2.extractParamMap() are always empty.
And it fails with

16/05/12 09:58:48 WARN TaskSetManager: Lost task 2.0 in stage 40.0 (TID 152, localhost): java.lang.IllegalArgumentException: requirement failed: Logistic Regression getThreshold found inconsistent values for threshold (0.5) and thresholds (equivalent to 0.55) at scala.Predef$.require(Predef.scala:224) at org.apache.spark.ml.classification.LogisticRegressionParams$class.checkThresholdConsistency(LogisticRegression.scala:143)

I will revert this change. Thanks!

SparkQA · 2016-05-12T02:16:08Z

Test build #58431 has finished for PR 13050 at commit f8d51b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-05-12T04:54:31Z

LGTM, but let's retest this please just in case. There have been a lot of build breaks related to changes like these lately.

SparkQA · 2016-05-12T05:05:33Z

Test build #58446 has finished for PR 13050 at commit f8d51b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-05-12T05:45:06Z

Merging into master 2.0.

…indent in examples ## What changes were proposed in this pull request? 1, Use `SparkSession` according to [SPARK-15031](https://issues.apache.org/jira/browse/SPARK-15031) 2, Update indent for `SparkContext` according to [SPARK-15134](https://issues.apache.org/jira/browse/SPARK-15134) 3, BTW, remove some duplicate space and add missing '.' ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13050 from zhengruifeng/use_sparksession. (cherry picked from commit 9e266d0) Signed-off-by: Andrew Or <andrew@databricks.com>

…with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below: - `simple_params_example.py` - `aft_survival_regression.py` are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet. This PR corrects the example and make this use SparkSession. In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`). According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61. So, in this PR, it sets the equivalent value so that this does not throw an exception. ## How was this patch tested? Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`) Author: hyukjinkwon <gurwls223@gmail.com> Closes #13135 from HyukjinKwon/SPARK-15031. (cherry picked from commit e2ec32d) Signed-off-by: Nick Pentreath <nickp@za.ibm.com>

…with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below: - `simple_params_example.py` - `aft_survival_regression.py` are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet. This PR corrects the example and make this use SparkSession. In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`). According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61. So, in this PR, it sets the equivalent value so that this does not throw an exception. ## How was this patch tested? Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`) Author: hyukjinkwon <gurwls223@gmail.com> Closes #13135 from HyukjinKwon/SPARK-15031.

HyukjinKwon reviewed May 11, 2016
View reviewed changes

zhengruifeng added 3 commits May 12, 2016 10:01

create pr

1f9ad69

update

1c88f71

revert change in simple_params_example.py because this example dont work

f8d51b9

zhengruifeng force-pushed the use_sparksession branch from 3cbca5b to f8d51b9 Compare May 12, 2016 02:06

asfgit closed this in 9e266d0 May 12, 2016

HyukjinKwon mentioned this pull request May 16, 2016

[SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python param example working with SparkSession #13135

Closed

zhengruifeng deleted the use_sparksession branch May 18, 2016 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-15031][SPARK-15134][EXAMPLE][DOC] Use SparkSession and update indent in examples #13050

[SPARK-15031][SPARK-15134][EXAMPLE][DOC] Use SparkSession and update indent in examples #13050

zhengruifeng commented May 11, 2016 •

edited

zhengruifeng commented May 11, 2016

SparkQA commented May 11, 2016

HyukjinKwon May 11, 2016 •

edited

zhengruifeng May 12, 2016

SparkQA commented May 12, 2016

andrewor14 commented May 12, 2016

SparkQA commented May 12, 2016

andrewor14 commented May 12, 2016

[SPARK-15031][SPARK-15134][EXAMPLE][DOC] Use SparkSession and update indent in examples #13050

[SPARK-15031][SPARK-15134][EXAMPLE][DOC] Use SparkSession and update indent in examples #13050

Conversation

zhengruifeng commented May 11, 2016 • edited

What changes were proposed in this pull request?

How was this patch tested?

zhengruifeng commented May 11, 2016

SparkQA commented May 11, 2016

HyukjinKwon May 11, 2016 • edited

Choose a reason for hiding this comment

zhengruifeng May 12, 2016

Choose a reason for hiding this comment

SparkQA commented May 12, 2016

andrewor14 commented May 12, 2016

SparkQA commented May 12, 2016

andrewor14 commented May 12, 2016

zhengruifeng commented May 11, 2016 •

edited

HyukjinKwon May 11, 2016 •

edited