Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27638][SQL]: Cast string to date/timestamp in binary comparisons with dates/timestamps #24567

Closed
wants to merge 7 commits into from
Expand Up @@ -120,11 +120,11 @@ object TypeCoercion {
*/
private def findCommonTypeForBinaryComparison(
dt1: DataType, dt2: DataType, conf: SQLConf): Option[DataType] = (dt1, dt2) match {
// We should cast all relative timestamp/date/string comparison into string comparisons
case (StringType, DateType) => Some(DateType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed these 2 cases but the added test still completes successfully. We need a few tests that fail without the changes in findCommonTypeForBinaryComparison.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Thanks for pointing it out. It simplifies the code a lot.

case (DateType, StringType) => Some(DateType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean we always find the common type as date when any arbitrary strings are compared to any dates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it there any issue in your opinion?

// We should cast all relative timestamp/string comparison into string comparisons
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's justified to do it for date, I think we should do it for timestamp as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// This behaves as a user would expect because timestamp strings sort lexicographically.
// i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true
case (StringType, DateType) => Some(StringType)
case (DateType, StringType) => Some(StringType)
case (StringType, TimestampType) => Some(StringType)
case (TimestampType, StringType) => Some(StringType)
case (StringType, NullType) => Some(StringType)
Expand Down
14 changes: 14 additions & 0 deletions sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Expand Up @@ -3024,6 +3024,20 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
sql("reset")
}
}

test("string date comparison") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add test cases for <=>, =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

spark.range(1).selectExpr("date '2000-01-01' as d").createOrReplaceTempView("t1")
val result = Date.valueOf("2000-01-01")
checkAnswer(sql("select * from t1 where d >= '2000'"), Row(result))
checkAnswer(sql("select * from t1 where d >= '2000-1'"), Row(result))
checkAnswer(sql("select * from t1 where d >= '2000-1-1'"), Row(result))
checkAnswer(sql("select * from t1 where d >= '2000-1-01'"), Row(result))
checkAnswer(sql("select * from t1 where d >= '2000-01-1'"), Row(result))
checkAnswer(sql("select * from t1 where d >= '2000-01-01'"), Row(result))
checkAnswer(sql("select * from t1 where d > '2000-01-01'"), Nil)
checkAnswer(sql("select * from t1 where '2000' >= d"), Row(result))
checkAnswer(sql("select * from t1 where d > '2000-13'"), Nil)
}
}

case class Foo(bar: Option[String])