Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9910][ML]User guide for train validation split #8377

Closed
wants to merge 7 commits into from

Conversation

zapletal-martin
Copy link
Contributor

No description provided.

@SparkQA
Copy link

SparkQA commented Aug 23, 2015

Test build #41417 has finished for PR 8377 at commit fb3b1d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaTrainValidationSplitExample

@jkbradley
Copy link
Member

I'll take a look. FYI this may conflict with [https://github.com//pull/8304]

// TrainValidationSplit will try all combinations of values and determine best model using
// the evaluator.
val paramGrid = new ParamGridBuilder()
.addGrid(lr.regParam, Array(0.1, 0.01))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this simpler (maybe only a 2x2 grid).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified to 2x2x3.

I wanted the example to be different from CrossValidator example. It only evaluates parameters of the LinearRegression rather than whole pipeline, it uses RegressionEvaluator as opposed to BinaryClassificationEvaluator and I also wanted to show how multiple parameter combinations can be evaluated.

@jkbradley
Copy link
Member

Just a few comments. Some Scala comments carry over to Java too. Thanks! (I like the examples.)

@@ -801,6 +801,173 @@ jsc.stop();

</div>

## Example: Model Selection via Train Validation Split
In addition to `CrossValidator` Spark also offers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would cut L805-808 since L811-819 says essentially the same thing but is better IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have kept L805 as introduction, deleted the others as suggested.

@zapletal-martin
Copy link
Contributor Author

Thanks for the comments! I have updated the review, ready for second pass.

@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #41652 has finished for PR 8377 at commit 6443d55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaTrainValidationSplitExample

@jkbradley
Copy link
Member

By "remove the imports," I like @feynmanliang meant to remove the imports which are no longer necessary. It'd be nice to keep the ones explicitly used in the example code.

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #41665 has finished for PR 8377 at commit 1dd1cd1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaTrainValidationSplitExample

case of `CrossValidator`. It is therefore less expensive, but will not produce as reliable results.

`TrainValidationSplit` takes an `Estimator`, a set of `ParamMap`s provided in the `estimatorParamMaps` parameter, and an
[`Evaluator`](api/scala/index.html#org.apache.spark.ml.Evaluator).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley has told me in the past to keep the scala API links under the scala codetab and to create a separate version linking java API docs under java codetabs, for example

However, it looks like we have been inconsistent throughout ml-guide.md about this so I'm OK with how it looks now. Just wanted to flag it in case others have something to say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix inconsistency. It is bad to show links to Scala API doc to Python users. We can remove this link for now.


<div data-lang="java" markdown="1">
{% highlight java %}
import org.apache.spark.SparkConf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove SparkConf and JavaSparkContext

@mengxr
Copy link
Contributor

mengxr commented Aug 28, 2015

@zapletal-martin Do you have time to update this time? It would be nice to merge this first before I make a pass to the whole user guide to avoid merge conflicts. Or I can prepare a PR based on yours.

@zapletal-martin
Copy link
Contributor Author

@mengxr, I am planning to work on this in around 5 hours. Unfortunately I won't be able to progress it before that.

@mengxr
Copy link
Contributor

mengxr commented Aug 28, 2015

No problems. That should work. I will wait for your update before I make a global check:)

@SparkQA
Copy link

SparkQA commented Aug 29, 2015

Test build #41776 has finished for PR 8377 at commit dd3a0e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaTrainValidationSplitExample

asfgit pushed a commit that referenced this pull request Aug 29, 2015
Author: martinzapletal <zapletal-martin@email.cz>

Closes #8377 from zapletal-martin/SPARK-9910.

(cherry picked from commit e8ea5ba)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@asfgit asfgit closed this in e8ea5ba Aug 29, 2015
@mengxr
Copy link
Contributor

mengxr commented Aug 29, 2015

LGTM. Merged into master and branch-1.5. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants