New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24812][SQL] Last Access Time in the table description is not valid #21775
Conversation
val lastAcessField = desc.filter((r: Row) => r.getValuesMap(Seq("col_name")) | ||
.get("col_name").getOrElse("").equals("Last Access")) | ||
// Check whether lastAcessField key is exist | ||
assert(!lastAcessField.isEmpty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastAccessField.nonEmpty
sql(s"create table" + | ||
s" if not exists t1 (c1_int int, c2_string string, c3_float float)") | ||
val desc = sql("DESC FORMATTED t1").collect().toSeq | ||
val lastAcessField = desc.filter((r: Row) => r.getValuesMap(Seq("col_name")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lastAccessField
val lastAccess = { | ||
if (-1 == lastAccessTime) "UNKNOWN" else new Date(lastAccessTime).toString | ||
} | ||
map.put("Last Access", lastAccess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the val lastAccess?
map.put("Last Access",
if (-1 == lastAccessTime) "UNKNOWN" else new Date(lastAccessTime).toString)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current way is also fine
.get("col_name").getOrElse("").equals("Last Access")) | ||
// Check whether lastAcessField key is exist | ||
assert(!lastAcessField.isEmpty) | ||
val validLastAcessFieldValue = lastAcessField.filterNot((r: Row) => ((r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the val validLastAcessFieldValue
used?
val validLastAcessFieldValue = lastAcessField.filterNot((r: Row) => ((r | ||
.getValuesMap(Seq("data_type")) | ||
.get("data_type").contains(new Date(-1).toString)))) | ||
assert(lastAcessField.size!=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code style nit: blank before and after '!='
ok to test |
Seems making sense.
Do you maybe know related Hive side ticket or can you point me out the related codes? |
test("desc formatted table for last access verification") { | ||
withTable("t1") { | ||
sql(s"create table" + | ||
s" if not exists t1 (c1_int int, c2_string string, c3_float float)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
sql(
"CREATE TABLE IF NOT EXISTS t1 (c1_int INT, c2_string STRING, c3_float FLOAT)")
s" if not exists t1 (c1_int int, c2_string string, c3_float float)") | ||
val desc = sql("DESC FORMATTED t1").collect().toSeq | ||
val lastAccessField = desc.filter((r: Row) => r.getValuesMap(Seq("col_name")) | ||
.get("col_name").getOrElse("").equals("Last Access")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify this via, for instance, desc.filter($"col_name".startswith("...")).select("data_type")
?
@@ -2250,6 +2251,22 @@ class HiveDDLSuite | |||
} | |||
} | |||
|
|||
test("desc formatted table for last access verification") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's name it SPARK-24812: desc formatted table for last access verification
Test build #93311 has finished for PR 21775 at commit
|
This is an external change. Please add one more point in the migration guide. |
sure, i will update the PR based on the comments, Thanks for suggestions. |
Test build #93406 has finished for PR 21775 at commit
|
Test build #93407 has finished for PR 21775 at commit
|
@HyukjinKwon
|
Test build #93409 has finished for PR 21775 at commit
|
retest this please |
Test build #93421 has finished for PR 21775 at commit
|
docs/sql-programming-guide.md
Outdated
@@ -1843,6 +1843,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see | |||
|
|||
## Upgrading From Spark SQL 2.3 to 2.4 | |||
|
|||
- Since Spark 2.4, Spark will display hive table description column `Last Access` value as `UNKNOWN` following the Hive system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is applicable to both native and hive tables. How about changing it to
Spark will display table description column
Last Access
value asUNKNOWN
when the value wasJan 01 1970
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, its applicable for both type, i will update the message as per your comment. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Test build #93431 has finished for PR 21775 at commit
|
ok to test |
retest this please |
Test build #93437 has finished for PR 21775 at commit
|
…alid ## What changes were proposed in this pull request? Last Access Time will always displayed wrong date Wed Dec 31 15:59:59 PST 1969 when user run DESC FORMATTED table command In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong date. seems to be a limitation as of now, better we can follow the hive behavior unless the limitation has been resolved from hive. ## How was this patch tested? UT has been added which makes sure that the wrong date "Wed Dec 31 15:59:59 PST 1969 " shall not be added as value for the Last Access property
@HyukjinKwon @gatorsmile All issues has been addressed, please let me know how this patch looks like. Thanks . |
retest this please |
The commit has been tested. LGTM Thanks! Merged to master. |
Test build #93513 has finished for PR 21775 at commit
|
What changes were proposed in this pull request?
Last Access Time will always displayed wrong date Thu Jan 01 05:30:00 IST 1970 when user run DESC FORMATTED table command
In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong date. seems to be a limitation as of now even from hive, better we can follow the hive behavior unless the limitation has been resolved from hive.
spark client output
Hive client output
How was this patch tested?
UT has been added which makes sure that the wrong date "Thu Jan 01 05:30:00 IST 1970 "
shall not be added as value for the Last Access property