Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15031][SPARK-15134][EXAMPLE][DOC] Use SparkSession and update indent in examples #13050

Closed
wants to merge 3 commits into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented May 11, 2016

What changes were proposed in this pull request?

1, Use SparkSession according to SPARK-15031
2, Update indent for SparkContext according to SPARK-15134
3, BTW, remove some duplicate space and add missing '.'

How was this patch tested?

manual tests

@zhengruifeng
Copy link
Contributor Author

cc @andrewor14

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58370 has finished for PR 13050 at commit 3cbca5b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

exit(1)
sc = SparkContext(appName="PythonSimpleParamsExample")
sqlContext = SQLContext(sc)
spark = SparkSession \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this example works. I remember this was not fixed in #12809 because it does not work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. model1.extractParamMap() and model2.extractParamMap() are always empty.
And it fails with

16/05/12 09:58:48 WARN TaskSetManager: Lost task 2.0 in stage 40.0 (TID 152, localhost): java.lang.IllegalArgumentException: requirement failed: Logistic Regression getThreshold found inconsistent values for threshold (0.5) and thresholds (equivalent to 0.55)
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.ml.classification.LogisticRegressionParams$class.checkThresholdConsistency(LogisticRegression.scala:143)

I will revert this change. Thanks!

@SparkQA
Copy link

SparkQA commented May 12, 2016

Test build #58431 has finished for PR 13050 at commit f8d51b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

LGTM, but let's retest this please just in case. There have been a lot of build breaks related to changes like these lately.

@SparkQA
Copy link

SparkQA commented May 12, 2016

Test build #58446 has finished for PR 13050 at commit f8d51b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Merging into master 2.0.

asfgit pushed a commit that referenced this pull request May 12, 2016
…indent in examples

## What changes were proposed in this pull request?
1, Use `SparkSession` according to [SPARK-15031](https://issues.apache.org/jira/browse/SPARK-15031)
2, Update indent for `SparkContext` according to [SPARK-15134](https://issues.apache.org/jira/browse/SPARK-15134)
3, BTW, remove some duplicate space and add missing '.'

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #13050 from zhengruifeng/use_sparksession.

(cherry picked from commit 9e266d0)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 9e266d0 May 12, 2016
@zhengruifeng zhengruifeng deleted the use_sparksession branch May 18, 2016 02:52
asfgit pushed a commit that referenced this pull request May 19, 2016
…with SparkSession

## What changes were proposed in this pull request?

It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below:

- `simple_params_example.py`
- `aft_survival_regression.py`

are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet.

This PR corrects the example and make this use SparkSession.

In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`).

According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61.

So, in this PR, it sets the equivalent value so that this does not throw an exception.

## How was this patch tested?

Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`)

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #13135 from HyukjinKwon/SPARK-15031.

(cherry picked from commit e2ec32d)
Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
asfgit pushed a commit that referenced this pull request May 19, 2016
…with SparkSession

## What changes were proposed in this pull request?

It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below:

- `simple_params_example.py`
- `aft_survival_regression.py`

are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet.

This PR corrects the example and make this use SparkSession.

In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`).

According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61.

So, in this PR, it sets the equivalent value so that this does not throw an exception.

## How was this patch tested?

Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`)

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #13135 from HyukjinKwon/SPARK-15031.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants