Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML][docs][minor] Define LabeledDocument/Document classes in CV example #5135

Closed
wants to merge 2 commits into from

Conversation

petro-rudenko
Copy link
Contributor

To easier copy/paste Cross-Validation example code snippet need to define LabeledDocument/Document in it, since they difined in a previous example.

To easier copy/paste Cross-Validation example code snippet need to define LabeledDocument/Document in it, since they difined in a previous example.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Mar 23, 2015

It seems reasonable, but adds a fair bit of code to the Java example. I'm not sure if the intent was that it be runnable, or simply illustrate a snippet of the core API usage. @mengxr

@mengxr
Copy link
Contributor

mengxr commented Mar 24, 2015

I'm okay with this change, which makes the example self-contained and hence users can try it out easily.

@mengxr
Copy link
Contributor

mengxr commented Mar 24, 2015

add to whitelist

@mengxr
Copy link
Contributor

mengxr commented Mar 24, 2015

ok to test

// Labeled and unlabeled instance types.
// Spark SQL can infer schema from Java Beans.
public class Document implements Serializable {
private Long id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long -> long

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29051 has started for PR 5135 at commit 1d35383.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29051 has finished for PR 5135 at commit 1d35383.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class LabeledDocument(id: Long, text: String, label: Double)
    • case class Document(id: Long, text: String)
    • public class Document implements Serializable
    • public class LabeledDocument extends Document implements Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29051/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29088 has started for PR 5135 at commit 5190c75.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29088 has finished for PR 5135 at commit 5190c75.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class LabeledDocument(id: Long, text: String, label: Double)
    • case class Document(id: Long, text: String)
    • public class Document implements Serializable
    • public class LabeledDocument extends Document implements Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29088/
Test PASSed.

@@ -655,6 +660,36 @@ import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;

// Labeled and unlabeled instance types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so this is intentionally duplicated from the example above? I guess that's reasonable since the point is to be self-contained, and I don't imagine there's a lot of maintenance overhead in trying to evolve both copies together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's annoying when copy/pasting a bunch of code into spark shell and it fails because these classes are not declared.

asfgit pushed a commit that referenced this pull request Mar 24, 2015
To easier copy/paste Cross-Validation example code snippet need to define LabeledDocument/Document in it, since they difined in a previous example.

Author: Peter Rudenko <petro.rudenko@gmail.com>

Closes #5135 from petro-rudenko/patch-3 and squashes the following commits:

5190c75 [Peter Rudenko] Fix primitive types for java examples.
1d35383 [Peter Rudenko] [SQL][docs][minor] Define LabeledDocument/Document classes in CV example

(cherry picked from commit 08d4528)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@asfgit asfgit closed this in 08d4528 Mar 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants