-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9910][ML]User guide for train validation split #8377
Conversation
Test build #41417 has finished for PR 8377 at commit
|
I'll take a look. FYI this may conflict with [https://github.com//pull/8304] |
// TrainValidationSplit will try all combinations of values and determine best model using | ||
// the evaluator. | ||
val paramGrid = new ParamGridBuilder() | ||
.addGrid(lr.regParam, Array(0.1, 0.01)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this simpler (maybe only a 2x2 grid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified to 2x2x3.
I wanted the example to be different from CrossValidator example. It only evaluates parameters of the LinearRegression rather than whole pipeline, it uses RegressionEvaluator as opposed to BinaryClassificationEvaluator and I also wanted to show how multiple parameter combinations can be evaluated.
Just a few comments. Some Scala comments carry over to Java too. Thanks! (I like the examples.) |
@@ -801,6 +801,173 @@ jsc.stop(); | |||
|
|||
</div> | |||
|
|||
## Example: Model Selection via Train Validation Split | |||
In addition to `CrossValidator` Spark also offers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would cut L805-808 since L811-819 says essentially the same thing but is better IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have kept L805 as introduction, deleted the others as suggested.
Thanks for the comments! I have updated the review, ready for second pass. |
Test build #41652 has finished for PR 8377 at commit
|
By "remove the imports," I like @feynmanliang meant to remove the imports which are no longer necessary. It'd be nice to keep the ones explicitly used in the example code. |
Test build #41665 has finished for PR 8377 at commit
|
case of `CrossValidator`. It is therefore less expensive, but will not produce as reliable results. | ||
|
||
`TrainValidationSplit` takes an `Estimator`, a set of `ParamMap`s provided in the `estimatorParamMaps` parameter, and an | ||
[`Evaluator`](api/scala/index.html#org.apache.spark.ml.Evaluator). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkbradley has told me in the past to keep the scala API links under the scala codetab and to create a separate version linking java API docs under java codetabs, for example
However, it looks like we have been inconsistent throughout ml-guide.md
about this so I'm OK with how it looks now. Just wanted to flag it in case others have something to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to fix inconsistency. It is bad to show links to Scala API doc to Python users. We can remove this link for now.
|
||
<div data-lang="java" markdown="1"> | ||
{% highlight java %} | ||
import org.apache.spark.SparkConf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove SparkConf
and JavaSparkContext
@zapletal-martin Do you have time to update this time? It would be nice to merge this first before I make a pass to the whole user guide to avoid merge conflicts. Or I can prepare a PR based on yours. |
@mengxr, I am planning to work on this in around 5 hours. Unfortunately I won't be able to progress it before that. |
No problems. That should work. I will wait for your update before I make a global check:) |
Test build #41776 has finished for PR 8377 at commit
|
Author: martinzapletal <zapletal-martin@email.cz> Closes #8377 from zapletal-martin/SPARK-9910. (cherry picked from commit e8ea5ba) Signed-off-by: Xiangrui Meng <meng@databricks.com>
LGTM. Merged into master and branch-1.5. Thanks! |
No description provided.