Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24812][SQL] Last Access Time in the table description is not valid #21775

Closed
wants to merge 1 commit into from

Conversation

sujith71955
Copy link
Contributor

@sujith71955 sujith71955 commented Jul 15, 2018

What changes were proposed in this pull request?

Last Access Time will always displayed wrong date Thu Jan 01 05:30:00 IST 1970 when user run DESC FORMATTED table command
In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong date. seems to be a limitation as of now even from hive, better we can follow the hive behavior unless the limitation has been resolved from hive.

spark client output
spark_desc table

Hive client output
hive_behaviour

How was this patch tested?

UT has been added which makes sure that the wrong date "Thu Jan 01 05:30:00 IST 1970 "
shall not be added as value for the Last Access property

val lastAcessField = desc.filter((r: Row) => r.getValuesMap(Seq("col_name"))
.get("col_name").getOrElse("").equals("Last Access"))
// Check whether lastAcessField key is exist
assert(!lastAcessField.isEmpty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastAccessField.nonEmpty

sql(s"create table" +
s" if not exists t1 (c1_int int, c2_string string, c3_float float)")
val desc = sql("DESC FORMATTED t1").collect().toSeq
val lastAcessField = desc.filter((r: Row) => r.getValuesMap(Seq("col_name"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lastAccessField

val lastAccess = {
if (-1 == lastAccessTime) "UNKNOWN" else new Date(lastAccessTime).toString
}
map.put("Last Access", lastAccess)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the val lastAccess?

map.put("Last Access",
      if (-1 == lastAccessTime) "UNKNOWN" else new Date(lastAccessTime).toString)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current way is also fine

.get("col_name").getOrElse("").equals("Last Access"))
// Check whether lastAcessField key is exist
assert(!lastAcessField.isEmpty)
val validLastAcessFieldValue = lastAcessField.filterNot((r: Row) => ((r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the val validLastAcessFieldValue used?

val validLastAcessFieldValue = lastAcessField.filterNot((r: Row) => ((r
.getValuesMap(Seq("data_type"))
.get("data_type").contains(new Date(-1).toString))))
assert(lastAcessField.size!=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code style nit: blank before and after '!='

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

Seems making sense.

seems to be a limitation as of now even from hive, better we can follow the hive behavior unless the limitation has been resolved from hive.

Do you maybe know related Hive side ticket or can you point me out the related codes?

test("desc formatted table for last access verification") {
withTable("t1") {
sql(s"create table" +
s" if not exists t1 (c1_int int, c2_string string, c3_float float)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

      sql(
        "CREATE TABLE IF NOT EXISTS t1 (c1_int INT, c2_string STRING, c3_float FLOAT)")

s" if not exists t1 (c1_int int, c2_string string, c3_float float)")
val desc = sql("DESC FORMATTED t1").collect().toSeq
val lastAccessField = desc.filter((r: Row) => r.getValuesMap(Seq("col_name"))
.get("col_name").getOrElse("").equals("Last Access"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this via, for instance, desc.filter($"col_name".startswith("...")).select("data_type")?

@@ -2250,6 +2251,22 @@ class HiveDDLSuite
}
}

test("desc formatted table for last access verification") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's name it SPARK-24812: desc formatted table for last access verification

@SparkQA
Copy link

SparkQA commented Jul 20, 2018

Test build #93311 has finished for PR 21775 at commit b527fdc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

This is an external change. Please add one more point in the migration guide.

@sujith71955
Copy link
Contributor Author

sure, i will update the PR based on the comments, Thanks for suggestions.

@SparkQA
Copy link

SparkQA commented Jul 22, 2018

Test build #93406 has finished for PR 21775 at commit e9b0f91.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2018

Test build #93407 has finished for PR 21775 at commit 2008c21.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sujith71955
Copy link
Contributor Author

@HyukjinKwon
seems to be a limitation as of now even from hive, better we can follow the hive behavior unless the limitation has been resolved from hive.

Hive-2526 is the JIRA related to this issue , could not figure out any other open JIRA's mentioning this problem

@SparkQA
Copy link

SparkQA commented Jul 22, 2018

Test build #93409 has finished for PR 21775 at commit 140c4ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 23, 2018

Test build #93421 has finished for PR 21775 at commit 140c4ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1843,6 +1843,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

## Upgrading From Spark SQL 2.3 to 2.4

- Since Spark 2.4, Spark will display hive table description column `Last Access` value as `UNKNOWN` following the Hive system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is applicable to both native and hive tables. How about changing it to

Spark will display table description column Last Access value as UNKNOWN when the value was Jan 01 1970.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, its applicable for both type, i will update the message as per your comment. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@SparkQA
Copy link

SparkQA commented Jul 23, 2018

Test build #93431 has finished for PR 21775 at commit 76a34c6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 23, 2018

Test build #93437 has finished for PR 21775 at commit 76a34c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…alid

## What changes were proposed in this pull request?
Last Access Time will always displayed wrong date Wed Dec 31 15:59:59 PST 1969 when user run  DESC FORMATTED table command
In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong date. seems to be a limitation as of now, better we can follow the hive behavior
unless the limitation has been resolved from hive.

## How was this patch tested?
UT has been added which makes sure that the wrong date "Wed Dec 31 15:59:59 PST 1969 "
shall not be added as value for the Last Access  property
@sujith71955
Copy link
Contributor Author

@HyukjinKwon @gatorsmile All issues has been addressed, please let me know how this patch looks like. Thanks .

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

The commit has been tested.

LGTM Thanks! Merged to master.

@asfgit asfgit closed this in d4a277f Jul 24, 2018
@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93513 has finished for PR 21775 at commit 76a34c6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants