Skip to content

Commit

Permalink
fix typo
Browse files Browse the repository at this point in the history
  • Loading branch information
Wayne Zhang committed May 20, 2017
1 parent 341949c commit 24818a7
Showing 1 changed file with 17 additions and 15 deletions.
32 changes: 17 additions & 15 deletions mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
Original file line number Diff line number Diff line change
Expand Up @@ -38,33 +38,35 @@ import org.apache.spark.sql.types._
private[feature] trait RFormulaBase extends HasFeaturesCol with HasLabelCol {

/**
* Param for how to order categories of a FEATURE string column used by `StringIndexer`.
* Param for how to order categories of a string FEATURE column used by `StringIndexer`.
* The last category after ordering is dropped when encoding strings.
* The options are explained using an example string: 'b', 'a', 'b', 'a', 'c', 'b'
* {{{
* +-----------------+---------------------------------------+---------------------------------+
* | Option | Category mapped to 0 by StringIndexer | Category dropped by RFormula |
* +-----------------+---------------------------------------+---------------------------------+
* | 'frequencyDesc' | most frequent category ('b') | least frequent category ('c') |
* | 'frequencyAsc' | least frequent category ('c') | most frequent category ('b') |
* | 'alphabetDesc' | first alphabetical category ('a') | last alphabetical category ('c')|
* | 'alphabetAsc' | last alphabetical category ('c') | last alphabetical category ('a')|
* +-----------------+---------------------------------------+---------------------------------+
* }}}
* Supported options: 'frequencyDesc', 'frequencyAsc', 'alphabetDesc', 'alphabetAsc'.
* The default value is 'frequencyDesc'. When the ordering is set to 'alphabetDesc', `RFormula`
* drops the same category as R when encoding strings.
*
* The options are explained using an example `'b', 'a', 'b', 'a', 'c', 'b'`:
* {{{
* +-----------------+---------------------------------------+----------------------------------+
* | Option | Category mapped to 0 by StringIndexer | Category dropped by RFormula |
* +-----------------+---------------------------------------+----------------------------------+
* | 'frequencyDesc' | most frequent category ('b') | least frequent category ('c') |
* | 'frequencyAsc' | least frequent category ('c') | most frequent category ('b') |
* | 'alphabetDesc' | first alphabetical category ('a') | last alphabetical category ('c') |
* | 'alphabetAsc' | last alphabetical category ('c') | first alphabetical category ('a')|
* +-----------------+---------------------------------------+----------------------------------+
* }}}
* Note that this ordering option is NOT used for the label column. When the label column is
* indexed, it uses the default descending frequency ordering in `StringIndexer`.
*
* @group param
*/
@Since("2.3.0")
final val stringIndexerOrderType: Param[String] = new Param(this, "stringIndexerOrderType",
"How to order categories of a FEATURE string column used by StringIndexer. " +
"How to order categories of a string FEATURE column used by StringIndexer. " +
"The last category after ordering is dropped when encoding strings. " +
s"Supported options: ${StringIndexer.supportedStringOrderType.mkString(", ")}. " +
"The default value is 'frequencyDesc'. When the ordering is set to 'alphabetDesc', " +
"RFormula drops the same category as R when encoding strings." +
s"Supported options: ${StringIndexer.supportedStringOrderType.mkString(", ")}.",
"RFormula drops the same category as R when encoding strings.",
ParamValidators.inArray(StringIndexer.supportedStringOrderType))

/** @group getParam */
Expand Down

0 comments on commit 24818a7

Please sign in to comment.