Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11965] [ML] [Doc] Update user guide for RFormula feature interactions #10222

Closed
wants to merge 8 commits into from

Conversation

yanboliang
Copy link
Contributor

Update user guide for RFormula feature interactions. Meanwhile we also update other new features such as supporting string label in Spark 1.6.

@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47420 has finished for PR 10222 at commit b08e063.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47520 has finished for PR 10222 at commit b08e063.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47562 has finished for PR 10222 at commit b08e063.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

`RFormula` produces a vector column of features and a double or string column of label.
Like when formulas are used in R for linear regression, string input columns will be one-hot encoded, and numeric columns will be cast to doubles.
If the label column is string type, it will be first transformed to double label with `StringIndexer`.
If the label does not already present in the DataFrame, the output label column will be created from the specified response variable in the formula.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If the label is not"

@BenFradet
Copy link
Contributor

A few remarks regarding phrasing but otherwise it lgtm.

@SparkQA
Copy link

SparkQA commented Dec 16, 2015

Test build #47799 has finished for PR 10222 at commit a98f0af.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


Suppose `a` and `b` are double columns, we use the following simple examples to illustrate the effect of `RFormula`:

* `y ~ a + b` means model `y = w0 + w1 * a + w2 * b` where `w0` is the intercept and `w1, w2` are coefficients
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= -> ~ because the model family is not yet determined

@mengxr
Copy link
Contributor

mengxr commented Jan 19, 2016

Made some minor comments inline. We should consider copying the content to the API doc to keep the API doc and user guide in sync. It would be nice if we can define a process to update the doc. My recommendation is to keep the Scala API doc update-to-date and then make user guide and Python/R API doc derive from it.

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #49786 has finished for PR 10222 at commit 3b9e886.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #49787 has finished for PR 10222 at commit feac366.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #49790 has finished for PR 10222 at commit bc6ed66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*
* The basic operators are:
*
* `~` separate target and terms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct ScalaDoc syntax for a list. See https://wiki.scala-lang.org/display/SW/Syntax. You can run build/sbt mllib/doc to check the generated html API doc.

@mengxr
Copy link
Contributor

mengxr commented Jan 20, 2016

LGTM except one minor issue on ScalaDoc syntax.

@SparkQA
Copy link

SparkQA commented Jan 21, 2016

Test build #49850 has finished for PR 10222 at commit 7def89a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Jan 25, 2016

Merged into master. Thanks!

@asfgit asfgit closed this in dd2325d Jan 25, 2016
@yanboliang yanboliang deleted the spark-11965 branch January 26, 2016 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants