Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-10759 [MLlib] Add python example to model selection via cross-validation #11202

Closed
wants to merge 5 commits into from

Conversation

JeremyNixon
Copy link
Contributor

Python example in the same form as the Scala and Java examples. Successfully rendered on my local server and run through pyspark, results match those from the Scala example.

@sethah
Copy link
Contributor

sethah commented Feb 17, 2016

One question is whether we should be using {% include_example %} when adding new examples to the documentation. We could separate it out into different PRs, but then we are duplicating code (this example already exists here).

@yinxusen could you advise?

@yinxusen
Copy link
Contributor

It's better to reuse current cross validator with {% include_example %} other than writing it directly in markdown file.

@yinxusen
Copy link
Contributor

@JeremyNixon
Copy link
Contributor Author

yinxusen, I have a PR here (#11240) with the solution to just this example that takes advantage of {% include_example %}, though accepting it would make this example inconsistent with the rest of the examples in this doc.

I'm also going to go ahead and create a branch with all of the examples fixed with {% include_example %} to make them consistent with one another and take advantage of the automatic testing and dryer code. Would you prefer that I submit that PR here, or create a new JIRA stating that all are fixed?

@JeremyNixon
Copy link
Contributor Author

yinxusen, I went ahead and changed all of the examples in this doc to use {% include_example %} as well as updated the python cross validation example. Let me know if it's acceptable to make all of the changes within this JIRA, or if you'd prefer they be separated.

I also notice that a JIRA exists (https://issues.apache.org/jira/browse/SPARK-13012) for converting the original examples to use {% include_example %}. The PR makes the same changes that are here, except does not include the change to the python example for cross-validation that this JIRA is supposed to be for.

@mengxr
Copy link
Contributor

mengxr commented Feb 23, 2016

ok to test

@SparkQA
Copy link

SparkQA commented Feb 23, 2016

Test build #51791 has finished for PR 11202 at commit 14208ad.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@yinxusen
Copy link
Contributor

@JeremyNixon Could you merge the PR with the master?

@JeremyNixon
Copy link
Contributor Author

@yinxusen Will do. But I assume you mean change the commits to make this consistent with https://issues.apache.org/jira/browse/SPARK-13012 which was merged in yesterday?

@yinxusen
Copy link
Contributor

Yes. Can we close the issue? Since you have #11240

@JeremyNixon
Copy link
Contributor Author

@yinxusen Two things -

I finished an implementation of train-validation-split so that we could finish the examples for this doc here: master...JeremyNixon:train_val_split_python, not sure if it belongs in this PR or should be done separately. #11240 requires one small update, which I can push now.

@yinxusen
Copy link
Contributor

@JeremyNixon Give a PR under JIRA https://issues.apache.org/jira/browse/SPARK-12877 of the train-validation-split in PySpark. You can add the train-validation-split along with that JIRA. I think we can merge #11240 first.

@MLnick
Copy link
Contributor

MLnick commented Apr 15, 2016

@JeremyNixon is this PR still alive?

@JeremyNixon
Copy link
Contributor Author

@MLnick Dead PR, work completed with a combination of 230bbea and 02b1fef. Thanks for the heads up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants