diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala index 79cf37cd60a98..25271de0b132b 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala @@ -47,18 +47,17 @@ private[feature] trait RFormulaBase extends HasFeaturesCol with HasLabelCol { * the R formula docs here: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/formula.html * * The basic operators are: - * - * `~` separate target and terms - * `+` concat terms, "+ 0" means removing intercept - * `-` remove a term, "- 1" means removing intercept - * `:` interaction (multiplication for numeric values, or binarized categorical values) - * `.` all columns except target + * - `~` separate target and terms + * - `+` concat terms, "+ 0" means removing intercept + * - `-` remove a term, "- 1" means removing intercept + * - `:` interaction (multiplication for numeric values, or binarized categorical values) + * - `.` all columns except target * * Suppose `a` and `b` are double columns, we use the following simple examples * to illustrate the effect of `RFormula`: - * `y ~ a + b` means model `y ~ w0 + w1 * a + w2 * b` where `w0` is the intercept and `w1, w2` + * - `y ~ a + b` means model `y ~ w0 + w1 * a + w2 * b` where `w0` is the intercept and `w1, w2` * are coefficients. - * `y ~ a + b + a:b - 1` means model `y ~ w1 * a + w2 * b + w3 * a * b` where `w1, w2, w3` + * - `y ~ a + b + a:b - 1` means model `y ~ w1 * a + w2 * b + w3 * a * b` where `w1, w2, w3` * are coefficients. * * RFormula produces a vector column of features and a double or string column of label.