Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5891][ML] Add Binarizer ML Transformer #5699

Closed
wants to merge 5 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Apr 25, 2015

@SparkQA
Copy link

SparkQA commented Apr 25, 2015

Test build #30952 has finished for PR 5699 at commit 1682f8c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@viirya
Copy link
Member Author

viirya commented Apr 25, 2015

The failure is caused by an unrelated test in streaming.

@viirya
Copy link
Member Author

viirya commented Apr 25, 2015

please retest.

@mengxr
Copy link
Contributor

mengxr commented Apr 27, 2015

test this please

*/
@AlphaComponent
final class Binarizer extends Transformer
with HasInputCol with HasOutputCol with HasThreshold {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem with HasThreshold. Because in the doc we said "threshold used in binary classification". Maybe we should implement `threshold" param in Binarizer and document it correctly. Also, we need to document what the output is if the input equals the threshold.

@SparkQA
Copy link

SparkQA commented Apr 27, 2015

Test build #30972 has finished for PR 5699 at commit 1682f8c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31124 has finished for PR 5699 at commit cc4f03c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Binarizer extends Transformer with HasInputCol with HasOutputCol
    • trait ExpectsInputTypes
    • abstract class BinaryMathExpression(f: (Double, Double) => Double, name: String)
    • case class Pow(left: Expression, right: Expression) extends BinaryMathExpression(math.pow, "POWER")
    • case class Hypot(
    • case class Atan2(
    • abstract class MathematicalExpression(name: String)
    • abstract class MathematicalExpressionForDouble(f: Double => Double, name: String)
    • abstract class MathematicalExpressionForInt(f: Int => Int, name: String)
    • abstract class MathematicalExpressionForFloat(f: Float => Float, name: String)
    • abstract class MathematicalExpressionForLong(f: Long => Long, name: String)
    • case class Sin(child: Expression) extends MathematicalExpressionForDouble(math.sin, "SIN")
    • case class Asin(child: Expression) extends MathematicalExpressionForDouble(math.asin, "ASIN")
    • case class Sinh(child: Expression) extends MathematicalExpressionForDouble(math.sinh, "SINH")
    • case class Cos(child: Expression) extends MathematicalExpressionForDouble(math.cos, "COS")
    • case class Acos(child: Expression) extends MathematicalExpressionForDouble(math.acos, "ACOS")
    • case class Cosh(child: Expression) extends MathematicalExpressionForDouble(math.cosh, "COSH")
    • case class Tan(child: Expression) extends MathematicalExpressionForDouble(math.tan, "TAN")
    • case class Atan(child: Expression) extends MathematicalExpressionForDouble(math.atan, "ATAN")
    • case class Tanh(child: Expression) extends MathematicalExpressionForDouble(math.tanh, "TANH")
    • case class Ceil(child: Expression) extends MathematicalExpressionForDouble(math.ceil, "CEIL")
    • case class Floor(child: Expression) extends MathematicalExpressionForDouble(math.floor, "FLOOR")
    • case class Rint(child: Expression) extends MathematicalExpressionForDouble(math.rint, "ROUND")
    • case class Cbrt(child: Expression) extends MathematicalExpressionForDouble(math.cbrt, "CBRT")
    • case class Signum(child: Expression) extends MathematicalExpressionForDouble(math.signum, "SIGNUM")
    • case class ISignum(child: Expression) extends MathematicalExpressionForInt(math.signum, "ISIGNUM")
    • case class FSignum(child: Expression) extends MathematicalExpressionForFloat(math.signum, "FSIGNUM")
    • case class LSignum(child: Expression) extends MathematicalExpressionForLong(math.signum, "LSIGNUM")
    • case class ToDegrees(child: Expression)
    • case class ToRadians(child: Expression)
    • case class Log(child: Expression) extends MathematicalExpressionForDouble(math.log, "LOG")
    • case class Log10(child: Expression) extends MathematicalExpressionForDouble(math.log10, "LOG10")
    • case class Log1p(child: Expression) extends MathematicalExpressionForDouble(math.log1p, "LOG1P")
    • case class Exp(child: Expression) extends MathematicalExpressionForDouble(math.exp, "EXP")
    • case class Expm1(child: Expression) extends MathematicalExpressionForDouble(math.expm1, "EXPM1")
  • This patch does not change any dependencies.

import org.apache.spark.ml.Transformer
import org.apache.spark.ml.attribute.BinaryAttribute
import org.apache.spark.ml.param._
import org.apache.spark.ml.param.shared._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only HasInputCol and HasOutputCol are used. So this could be more explicit.

@viirya
Copy link
Member Author

viirya commented May 1, 2015

@mengxr updated. Thanks for comments.

@SparkQA
Copy link

SparkQA commented May 1, 2015

Test build #31550 has finished for PR 5699 at commit 1a0b9a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Binarizer extends Transformer with HasInputCol with HasOutputCol

@viirya
Copy link
Member Author

viirya commented May 1, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented May 1, 2015

Test build #31553 has finished for PR 5699 at commit 1a0b9a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Binarizer extends Transformer with HasInputCol with HasOutputCol

override def beforeAll(): Unit = {
super.beforeAll()
sqlContext = new SQLContext(sc)
data = Array(0.1, -0.5, 0.2, -0.3, 0.8, 0.7, -0.1, -0.4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: data could be a val

@mengxr
Copy link
Contributor

mengxr commented May 1, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 7630213 May 1, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
JIRA: https://issues.apache.org/jira/browse/SPARK-5891

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#5699 from viirya/add_binarizer and squashes the following commits:

1a0b9a4 [Liang-Chi Hsieh] For comments.
bc397f2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
cc4f03c [Liang-Chi Hsieh] Implement threshold param and use merged params map.
7564c63 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
1682f8c [Liang-Chi Hsieh] Add Binarizer ML Transformer.
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
JIRA: https://issues.apache.org/jira/browse/SPARK-5891

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#5699 from viirya/add_binarizer and squashes the following commits:

1a0b9a4 [Liang-Chi Hsieh] For comments.
bc397f2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
cc4f03c [Liang-Chi Hsieh] Implement threshold param and use merged params map.
7564c63 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
1682f8c [Liang-Chi Hsieh] Add Binarizer ML Transformer.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
JIRA: https://issues.apache.org/jira/browse/SPARK-5891

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#5699 from viirya/add_binarizer and squashes the following commits:

1a0b9a4 [Liang-Chi Hsieh] For comments.
bc397f2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
cc4f03c [Liang-Chi Hsieh] Implement threshold param and use merged params map.
7564c63 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
1682f8c [Liang-Chi Hsieh] Add Binarizer ML Transformer.
@viirya viirya deleted the add_binarizer branch December 27, 2023 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants