Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8279][SQL]Add math function round #6938

Closed
wants to merge 21 commits into from

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Jun 22, 2015

@rxin
Copy link
Contributor

rxin commented Jun 22, 2015

Jenkins, add to whitelist.

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35467 has finished for PR 6938 at commit 0495feb.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Round(children: Seq[Expression]) extends Expression
    • trait BigDecimalConverter[T]

@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35469 has finished for PR 6938 at commit c5ce169.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Round(children: Seq[Expression]) extends Expression
    • trait BigDecimalConverter[T]

@chenghao-intel
Copy link
Contributor

duplicated with #6836?

@yjshen
Copy link
Member Author

yjshen commented Jun 23, 2015

@chenghao-intel, Yep, not aware there are two JIRAs for Round. Mind looking at this as well?

@@ -312,3 +315,90 @@ case class Logarithm(left: Expression, right: Expression)
"""
}
}

case class Round(children: Seq[Expression]) extends Expression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we support multiple constructors now in expression, see https://github.com/apache/spark/pull/6806/files#diff-d788f93e29b4d25cdd7d60328587678bR229

@chenghao-intel
Copy link
Contributor

As most of issues that I raised is solved in #6836, do you mind jump there and give some comments?
BTW, as the https://issues.apache.org/jira/browse/SPARK-8279 suggested, perhaps we'd better focus on the test case udf_round_3 test, which is not covered by #6836, what do you think?

@yjshen
Copy link
Member Author

yjshen commented Jun 23, 2015

@chenghao-intel, I think the main difference between this and #6836 is whether to make Round act as GenericUDFRound, if we want to support udf_round and udf_round_3.
I suppose following Hive's semantic is a good choice.

@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35521 has finished for PR 6938 at commit be02d5b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

Yes, #6836 follows the Hive's GenericUDFRound, but we should keep the output / input to o.a.s.s.types.Decimal, not BigDecimal (if it's a DecimalType), otherwise it will causes problem when interact with other Spark SQL expression. See https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L291

@yjshen
Copy link
Member Author

yjshen commented Jun 23, 2015

Oh, I think there exists misunderstood of the round method and BigDecimalConverter, BDC just as as a implicit type parameter of round to infer the first argument's type, for example Byte, call the corresponding constructor of BigDecimal and use it's setScale method to round, at last, round is using fromBigDecimal to convert the dataType back, i.e. to Byte in our example.

Therefore, I preserve the dataType of children(0), the only exceptions are String and Binary which would be regarded as Double in Hive.

@chenghao-intel
Copy link
Contributor

Oh, sorry, you did use the Decimal, for Round.eval, ignore my previous comment.

@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35539 has finished for PR 6938 at commit 40f4a99.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Round(child: Expression, scale: Expression) extends Expression
    • trait BigDecimalConverter[T]


def children: Seq[Expression] = Seq(child, scale)

def nullable: Boolean = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on child.nullable || scala.nullable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably more than that, Hive support String ,Double.NaN, Double.Infinity as input, all of these would result in null result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's say if the both children are literals e.g. Literal(123.0, FloatType) and Literal(1. IntegerType), still be nullable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not the end of the world to have nullable be more conservative, since it is technically correct to be nullable. however, if there is a way to do a more accurate way to determine nullability, we should do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nullable will be great useful in the expression optimization, we'd better handle it properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean nullable is sometimes determined at runtime, a not null string, double.NaN is not null themselves, but would eval to null in Round.

@yjshen yjshen changed the title [SPARK-8279][SQL][WIP]Add math function round [SPARK-8279][SQL]Add math function round Jun 23, 2015
@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35555 has finished for PR 6938 at commit 479fa9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Round(child: Expression, scale: Expression) extends Expression
    • trait BigDecimalConverter[T]

@yjshen
Copy link
Member Author

yjshen commented Jun 23, 2015

@rxin @marmbrus, mind reviewing this?

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35660 has finished for PR 6938 at commit c14f64d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Round(child: Expression, scale: Expression) extends Expression
    • trait BigDecimalConverter[T]

@yjshen
Copy link
Member Author

yjshen commented Jun 24, 2015

@chenghao-intel , refactored eval and moved type specific branching out.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37221 has finished for PR 6938 at commit 392b65b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Round(child: Expression, scale: Expression)

override def left: Expression = child
override def right: Expression = scale

override def children: Seq[Expression] = Seq(child, scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you need this


// round_scale > current_scale would result in precision increase
// and not allowed by o.a.s.s.types.Decimal.changePrecision, therefore null
(0 to 7).foreach { i =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i -> scale

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also using i for array index here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - can you at least move bdResults closer to this loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@rxin
Copy link
Contributor

rxin commented Jul 15, 2015

OK I'm going to merge this. Please submit a patch to fix the minor comments.

@asfgit asfgit closed this in f0e1297 Jul 15, 2015
}
}

override def prettyName: String = "round"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this, since the expression is already named Round

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37326 has finished for PR 6938 at commit 07a124c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Round(child: Expression, scale: Expression)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants