Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9526][SQL]Utilize randomized tests to reveal potential bugs in sql expressions #7855

Closed
wants to merge 7 commits into from

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Aug 1, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-9526

This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression.

@yjshen
Copy link
Member Author

yjshen commented Aug 1, 2015

Opening this early to get high level feed back ASAP.

Note: The current merge build should fail due to three two bugs:

  1. UnaryMinus's codegen version would fail to compile when the input is Long.MinValue
  2. Remainder would fail due to codegen and interpret mode returning different result for same input.
  3. MaxOf/MinOf would fail due to ClassCastException: BinaryType's ordering need Array[Byte] as input but GenericArrayData is given. Not a problem

These bugs are not fixed yet since I just finished prototyping.

@yjshen
Copy link
Member Author

yjshen commented Aug 1, 2015

cc @rxin @davies

@JoshRosen
Copy link
Contributor

For remainder, my hunch is that it's probably failing for extreme floating point values (e.g. take the remainder of a giant float by another giant float). I found a similar failure in #7625, an experimental branch of mine which contains some code for using reflection to write tests against all Expression subclasses.

The code in my branch lags a bit behind what I have locally (e.g. it may be missing some of the interpreted vs. codegen comparison code) so I can see about pushing the rest of my changes later. The approach in my branch probably definitely isn't the right one for unit testing; it was more intended to be an experiment to see whether it would be possible to do this all via reflection.

@SparkQA
Copy link

SparkQA commented Aug 1, 2015

Test build #39363 has finished for PR 7855 at commit daffd80.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StopWordsRemover(override val uid: String)

@yjshen
Copy link
Member Author

yjshen commented Aug 2, 2015

@JoshRosen , thanks for the information about #7625, it's great!
I'll read that in detail and see how I can refine my implementation accordingly.

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39409 has finished for PR 7855 at commit e3bbe4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class RequestExecutors(appId: String, requestedTotal: Int)
    • case class KillExecutors(appId: String, executorIds: Seq[String])
    • class SpecificSafeProjection extends $
    • case class FromUTCTimestamp(left: Expression, right: Expression)
    • case class ToUTCTimestamp(left: Expression, right: Expression)
    • case class DateDiff(endDate: Expression, startDate: Expression)
    • case class InitCap(child: Expression) extends UnaryExpression with ImplicitCastInputTypes

@yjshen yjshen changed the title [SPARK-9526][SQL][WIP] Utilize randomized tests to reveal potential bugs in sql expressions [SPARK-9526][SQL] Utilize randomized tests to reveal potential bugs in sql expressions Aug 2, 2015
@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39417 has finished for PR 7855 at commit 42769b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class RequestExecutors(appId: String, requestedTotal: Int)
    • case class KillExecutors(appId: String, executorIds: Seq[String])
    • class SpecificSafeProjection extends $
    • case class FromUTCTimestamp(left: Expression, right: Expression)
    • case class ToUTCTimestamp(left: Expression, right: Expression)
    • case class DateDiff(endDate: Expression, startDate: Expression)
    • case class InitCap(child: Expression) extends UnaryExpression with ImplicitCastInputTypes

@JoshRosen
Copy link
Contributor

Did this end up finding any new bugs?

@yjshen
Copy link
Member Author

yjshen commented Aug 3, 2015

All bugs revealed until now:

  1. UnaryMinus's codegen version would fail to compile when the input is Long.MinValue
  2. Remainder would fail due to codegen and interpret mode returning different result for same input. (yes, for remainding between giant values)
  3. BinaryComparison would fail to compile in codegen mode when comparing Boolean types.
  4. AddMonth would fail if passed a huge negative month, which would lead accessing negative index of monthDays array.

And I also fixed Nanvl by upcasting its operand if the are of different type.

val numericTypeWithoutDecimal: Set[DataType] = integralType ++ Set(DoubleType, FloatType)

/**
* Instances of all [[NumericType]]s and CalendarIntervalType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put [[ ]] around CalendarIntervalType so IntelliJ can find it during refactoring

@rxin
Copy link
Contributor

rxin commented Aug 3, 2015

@yjshen

to help reviewing, and separate important fixes from nice to have tests, can you submit a separate pull request that includes all the bug fixes, along with deterministic unit tests that would trigger those cases?

Then this pull request can be just about the randomized tests.

@JoshRosen
Copy link
Contributor

Bugfixes were done in #7882, so this should be ready for rebasing.

@yjshen
Copy link
Member Author

yjshen commented Aug 4, 2015

Ah, forgot the scaladoc on property check, will do now.

@JoshRosen
Copy link
Contributor

This is on my review queue for tomorrow.

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39650 has finished for PR 7855 at commit b2c6543.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class SortedIterator extends UnsafeSorterIterator
    • public class KVSorterIterator extends KVIterator<UnsafeRow, UnsafeRow>

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39676 has finished for PR 7855 at commit 5301891.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen
Copy link
Member Author

yjshen commented Aug 4, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #199 has finished for PR 7855 at commit 5301891.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen
Copy link
Member Author

yjshen commented Aug 4, 2015

Unrelated failure again and again.
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.(It is not a test)
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.(It is not a test)

@yjshen
Copy link
Member Author

yjshen commented Aug 4, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39696 has finished for PR 7855 at commit 5301891.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #203 has finished for PR 7855 at commit 5301891.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen
Copy link
Member Author

yjshen commented Aug 4, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #204 has finished for PR 7855 at commit 5301891.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39702 has finished for PR 7855 at commit 5301891.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -211,4 +215,80 @@ trait ExpressionEvalHelper {
plan(inputRow)).get(0, expression.dataType)
assert(checkResult(actual, expected))
}

def checkConsistency(dt: DataType, clazz: Class[_]): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about giving this a more specific name, such as checkConsistencyBetweenInterpretedAndCodegen? It would also be good to add Scaladoc to these methods to explain what they're doing, since the use of reflection might be non-obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, this method's Scaladoc could explain that it tests the expression's one-argument constructor with randomized literals of the given data type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think that we might be able to clean up the code slightly by adding a type to this method:

def checkConsistency[E <: Expression: ClassTag](dt: DataType)

to let callers write something like

checkConsistencyBetweenInterpretedAndCodegen[Sinh](DoubleType)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen
Copy link
Contributor

The basic approach here seems reasonable to me but I left a couple of comments regarding whether we need to use reflection and RE: some documentation / naming issues.

@SparkQA
Copy link

SparkQA commented Aug 15, 2015

Test build #40944 has finished for PR 7855 at commit 0a5bdc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
    • abstract class LocalNode extends TreeNode[LocalNode]
    • abstract class LeafLocalNode extends LocalNode
    • abstract class UnaryLocalNode extends LocalNode
    • case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
    • case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode

@yjshen
Copy link
Member Author

yjshen commented Aug 15, 2015

@JoshRosen , I've changed my implementation, do you mind review this again?

@JoshRosen
Copy link
Contributor

LGTM pending Jenkins; thanks!

@SparkQA
Copy link

SparkQA commented Aug 16, 2015

Test build #1627 has finished for PR 7855 at commit 0a5bdc9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
    • abstract class LocalNode extends TreeNode[LocalNode]
    • abstract class LeafLocalNode extends LocalNode
    • abstract class UnaryLocalNode extends LocalNode
    • case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
    • case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode

@yjshen
Copy link
Member Author

yjshen commented Aug 17, 2015

jenkins, retest this please.

@yjshen
Copy link
Member Author

yjshen commented Aug 17, 2015

unrelated failure, org.apache.spark.sql.hive.HiveSparkSubmitSuite.SPARK-8368: includes jars passed in through --jars

@yjshen
Copy link
Member Author

yjshen commented Aug 17, 2015

jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 17, 2015

Test build #41004 has finished for PR 7855 at commit 0a5bdc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks

@rxin
Copy link
Contributor

rxin commented Aug 17, 2015

@JoshRosen I will let you merge this one.

@JoshRosen
Copy link
Contributor

Will merge provided that this still compiles.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 17, 2015

Test build #41043 has finished for PR 7855 at commit 0a5bdc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks

@JoshRosen
Copy link
Contributor

Alright, merging this to master and branch-1.5. Thanks!

asfgit pushed a commit that referenced this pull request Aug 17, 2015
…in sql expressions

JIRA: https://issues.apache.org/jira/browse/SPARK-9526

This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression.

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #7855 from yjshen/property_check.

(cherry picked from commit b265e28)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
@asfgit asfgit closed this in b265e28 Aug 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants