New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15031][SPARK-15134][EXAMPLE][DOC] Use SparkSession and update indent in examples #13050
Conversation
cc @andrewor14 |
Test build #58370 has finished for PR 13050 at commit
|
exit(1) | ||
sc = SparkContext(appName="PythonSimpleParamsExample") | ||
sqlContext = SQLContext(sc) | ||
spark = SparkSession \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this example works. I remember this was not fixed in #12809 because it does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. model1.extractParamMap()
and model2.extractParamMap()
are always empty.
And it fails with
16/05/12 09:58:48 WARN TaskSetManager: Lost task 2.0 in stage 40.0 (TID 152, localhost): java.lang.IllegalArgumentException: requirement failed: Logistic Regression getThreshold found inconsistent values for threshold (0.5) and thresholds (equivalent to 0.55)
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.ml.classification.LogisticRegressionParams$class.checkThresholdConsistency(LogisticRegression.scala:143)
I will revert this change. Thanks!
3cbca5b
to
f8d51b9
Compare
Test build #58431 has finished for PR 13050 at commit
|
LGTM, but let's retest this please just in case. There have been a lot of build breaks related to changes like these lately. |
Test build #58446 has finished for PR 13050 at commit
|
Merging into master 2.0. |
…indent in examples ## What changes were proposed in this pull request? 1, Use `SparkSession` according to [SPARK-15031](https://issues.apache.org/jira/browse/SPARK-15031) 2, Update indent for `SparkContext` according to [SPARK-15134](https://issues.apache.org/jira/browse/SPARK-15134) 3, BTW, remove some duplicate space and add missing '.' ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13050 from zhengruifeng/use_sparksession. (cherry picked from commit 9e266d0) Signed-off-by: Andrew Or <andrew@databricks.com>
…with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below: - `simple_params_example.py` - `aft_survival_regression.py` are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet. This PR corrects the example and make this use SparkSession. In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`). According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61. So, in this PR, it sets the equivalent value so that this does not throw an exception. ## How was this patch tested? Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`) Author: hyukjinkwon <gurwls223@gmail.com> Closes #13135 from HyukjinKwon/SPARK-15031. (cherry picked from commit e2ec32d) Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
…with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below: - `simple_params_example.py` - `aft_survival_regression.py` are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet. This PR corrects the example and make this use SparkSession. In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`). According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61. So, in this PR, it sets the equivalent value so that this does not throw an exception. ## How was this patch tested? Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`) Author: hyukjinkwon <gurwls223@gmail.com> Closes #13135 from HyukjinKwon/SPARK-15031.
What changes were proposed in this pull request?
1, Use
SparkSession
according to SPARK-150312, Update indent for
SparkContext
according to SPARK-151343, BTW, remove some duplicate space and add missing '.'
How was this patch tested?
manual tests