[SPARK-11965] [ML] [Doc] Update user guide for RFormula feature interactions #10222

yanboliang · 2015-12-09T08:28:59Z

Update user guide for RFormula feature interactions. Meanwhile we also update other new features such as supporting string label in Spark 1.6.

SparkQA · 2015-12-09T08:47:40Z

Test build #47420 has finished for PR 10222 at commit b08e063.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-10T15:00:41Z

Test build #47520 has finished for PR 10222 at commit b08e063.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2015-12-11T02:31:23Z

Jenkins, test this please.

SparkQA · 2015-12-11T02:58:53Z

Test build #47562 has finished for PR 10222 at commit b08e063.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

BenFradet · 2015-12-14T14:57:09Z

docs/ml-features.md

+`RFormula` produces a vector column of features and a double or string column of label. 
+Like when formulas are used in R for linear regression, string input columns will be one-hot encoded, and numeric columns will be cast to doubles.
+If the label column is string type, it will be first transformed to double label with `StringIndexer`.
+If the label does not already present in the DataFrame, the output label column will be created from the specified response variable in the formula.


"If the label is not"

BenFradet · 2015-12-14T15:03:38Z

A few remarks regarding phrasing but otherwise it lgtm.

SparkQA · 2015-12-16T07:43:38Z

Test build #47799 has finished for PR 10222 at commit a98f0af.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2016-01-19T22:19:16Z

docs/ml-features.md

+
+Suppose `a` and `b` are double columns, we use the following simple examples to illustrate the effect of `RFormula`:
+
+* `y ~ a + b` means model `y = w0 + w1 * a + w2 * b` where `w0` is the intercept and `w1, w2` are coefficients


= -> ~ because the model family is not yet determined

mengxr · 2016-01-19T22:23:31Z

Made some minor comments inline. We should consider copying the content to the API doc to keep the API doc and user guide in sync. It would be nice if we can define a process to update the doc. My recommendation is to keep the Scala API doc update-to-date and then make user guide and Python/R API doc derive from it.

SparkQA · 2016-01-20T15:30:34Z

Test build #49786 has finished for PR 10222 at commit 3b9e886.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-20T15:35:45Z

Test build #49787 has finished for PR 10222 at commit feac366.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-20T16:54:28Z

Test build #49790 has finished for PR 10222 at commit bc6ed66.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2016-01-20T18:44:18Z

mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala

+ *
+ * The basic operators are:
+ *
+ * `~` separate target and terms


This is not the correct ScalaDoc syntax for a list. See https://wiki.scala-lang.org/display/SW/Syntax. You can run build/sbt mllib/doc to check the generated html API doc.

mengxr · 2016-01-20T18:46:21Z

LGTM except one minor issue on ScalaDoc syntax.

SparkQA · 2016-01-21T05:31:49Z

Test build #49850 has finished for PR 10222 at commit 7def89a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2016-01-25T19:52:25Z

Merged into master. Thanks!

Update user guide for RFormula feature interactions

b08e063

BenFradet reviewed Dec 14, 2015
View reviewed changes

update doc

a98f0af

mengxr reviewed Jan 19, 2016
View reviewed changes

yanboliang added 2 commits January 20, 2016 23:05

address comments

3b9e886

copy user guide doc to Scala API doc

feac366

yanboliang added 3 commits January 20, 2016 23:42

cut too long line

346cd21

fix typos

fccdc90

fix typos

bc6ed66

mengxr reviewed Jan 20, 2016
View reviewed changes

Fix list issue of Scala API doc

7def89a

asfgit closed this in dd2325d Jan 25, 2016

yanboliang deleted the spark-11965 branch January 26, 2016 02:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-11965] [ML] [Doc] Update user guide for RFormula feature interactions #10222

[SPARK-11965] [ML] [Doc] Update user guide for RFormula feature interactions #10222

yanboliang commented Dec 9, 2015

SparkQA commented Dec 9, 2015

SparkQA commented Dec 10, 2015

yanboliang commented Dec 11, 2015

SparkQA commented Dec 11, 2015

BenFradet Dec 14, 2015

BenFradet commented Dec 14, 2015

SparkQA commented Dec 16, 2015

mengxr Jan 19, 2016

mengxr commented Jan 19, 2016

SparkQA commented Jan 20, 2016

SparkQA commented Jan 20, 2016

SparkQA commented Jan 20, 2016

mengxr Jan 20, 2016

mengxr commented Jan 20, 2016

SparkQA commented Jan 21, 2016

mengxr commented Jan 25, 2016


		Suppose `a` and `b` are double columns, we use the following simple examples to illustrate the effect of `RFormula`:

		* `y ~ a + b` means model `y = w0 + w1 * a + w2 * b` where `w0` is the intercept and `w1, w2` are coefficients

[SPARK-11965] [ML] [Doc] Update user guide for RFormula feature interactions #10222

[SPARK-11965] [ML] [Doc] Update user guide for RFormula feature interactions #10222

Conversation

yanboliang commented Dec 9, 2015

SparkQA commented Dec 9, 2015

SparkQA commented Dec 10, 2015

yanboliang commented Dec 11, 2015

SparkQA commented Dec 11, 2015

BenFradet Dec 14, 2015

Choose a reason for hiding this comment

BenFradet commented Dec 14, 2015

SparkQA commented Dec 16, 2015

mengxr Jan 19, 2016

Choose a reason for hiding this comment

mengxr commented Jan 19, 2016

SparkQA commented Jan 20, 2016

SparkQA commented Jan 20, 2016

SparkQA commented Jan 20, 2016

mengxr Jan 20, 2016

Choose a reason for hiding this comment

mengxr commented Jan 20, 2016

SparkQA commented Jan 21, 2016

mengxr commented Jan 25, 2016