Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12258] [SQL] Hive Timestamp UDF is binded with '1969-12-31 15:59:59.999999' for null value #10249

Closed
wants to merge 3 commits into from

Conversation

gatorsmile
Copy link
Member

This PR is for fixing the following test case:

  test("Timestamp UDF and Null value") {
    hiveContext.runSqlHive("CREATE TABLE ts_test (ts TIMESTAMP) STORED AS TEXTFILE")
    hiveContext.runSqlHive("INSERT INTO TABLE ts_test VALUES(Null)")
    hiveContext.udf.register("dummy",
      (ts: Timestamp) => ts
    )
    val result = hiveContext.sql("SELECT dummy(ts) FROM ts_test").collect().mkString("\n")
    assertResult("[null]")(result)
  }

If users use BigDecimal, Date and Timestamp in UDF, UDF returns null if the input is null.

@cloud-fan I just followed your PR #9770 and add three more types. Could you review this PR?

Thank you!

@cloud-fan
Copy link
Contributor

Can you add more details to explain why Hive Timestamp UDF is binded with '1969-12-31 15:59:59.999999' for null value? It looks weird to me and is diffrent from the primitive problems I fixed in that PR.

For primitive types, there is no way to express null unless user use boxed type in UDF parameters, but Timestamp doesn't have this limit.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47522 has finished for PR 10249 at commit 0dfa110.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

The timestamp is bound to this specific number because CodeGen uses -1L as its default (null) value for Timestamp (assuming that your timezone is GMT-8)

@davies
Copy link
Contributor

davies commented Dec 10, 2015

I think the root cause is that the UDF does not handle null value for DateType/TimestampType/DecimalType correctly, we should pass null in instead of skipping the UDF (returning null).

@gatorsmile
Copy link
Member Author

Thank you for your comments! @cloud-fan @hvanhovell @davies

After digging it further, I just realized BigDecimal does not have such an issue. In the codeGenerator.java, we always treats TimestampType as LongType and treats DateType as IntegerType.

  def javaType(dt: DataType): String = dt match {
    case BooleanType => JAVA_BOOLEAN
    case ByteType => JAVA_BYTE
    case ShortType => JAVA_SHORT
    case IntegerType | DateType => JAVA_INT
    case LongType | TimestampType => JAVA_LONG
    case FloatType => JAVA_FLOAT

In all the codegen codes, we do not have separate logic for processing DateType and TimestampType. I am wondering if the current fix is ok?

Thank you again!

@yhuai
Copy link
Contributor

yhuai commented Dec 10, 2015

test this please

@yhuai
Copy link
Contributor

yhuai commented Dec 10, 2015

@gatorsmile Do we have other data types that still have this issue?

@gatorsmile
Copy link
Member Author

Let me do more tests. Will let you know soon.

@yhuai
Copy link
Contributor

yhuai commented Dec 10, 2015

test this please

@yhuai
Copy link
Contributor

yhuai commented Dec 10, 2015

@gatorsmile @davies has a fix that should fix the problem fundamentally (#10259). You can take a look and try it out.

@gatorsmile
Copy link
Member Author

Let me close it. Thanks!

@gatorsmile gatorsmile closed this Dec 10, 2015
asfgit pushed a commit that referenced this pull request Dec 11, 2015
Check nullability and passing them into ScalaUDF.

Closes #10249

Author: Davies Liu <davies@databricks.com>

Closes #10259 from davies/udf_null.

(cherry picked from commit b1b4ee7)
Signed-off-by: Yin Huai <yhuai@databricks.com>
asfgit pushed a commit that referenced this pull request Dec 11, 2015
Check nullability and passing them into ScalaUDF.

Closes #10249

Author: Davies Liu <davies@databricks.com>

Closes #10259 from davies/udf_null.
@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47547 has finished for PR 10249 at commit 446d1fa.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile deleted the udfNull branch December 30, 2015 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants