-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32205][SQL] Writing timestamp to mysql should be datetime type #29043
Conversation
Can one of the admins verify this patch? |
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { | ||
// For more details, please see | ||
// https://dev.mysql.com/doc/refman/5.7/en/datetime.html | ||
case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP)) | ||
case _ => None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need datetime in mysql ranther than timestamp.
@srowen Could you please help me check this PR : ) |
@TJX2014 Seems MySQL TIMESTAMP is affected by the time_zone setting but DATETIME is not (https://www.tech-recipes.com/rx/22599/mysql-datetime-vs-timestamp-data-type/). I think you will lose the time_zone info if you use DATETIME for TIMESTAMP. |
Yeah DATETIME is definitely not the same thing as TIMESTAMP. You need to start with a Spark date-time type, logically. |
@huaxingao @srowen Sure, DATETIME and TIMESTAMP seems both have 8 bytes, while DATE only 4 bytes, the only thing different in mysql seems is timezone index. |
... that is a very significant difference in what the 8 bytes mean! |
@srowen @huaxingao So could we use |
@TJX2014 Actually, regardless of time zone, map Suppose you create a table with a Before your fix, when inserting this timestamp '1111-01-01 00:00:01' to table test, mysql JDBC driver does the range check for this timestamp value and throw Exception because timestamp '1111-01-01 00:00:01' is not valid (mysql TIMESTAMP has a range of '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC) After your fix, when inserting this timestamp '1111-01-01 00:00:01' to table test, since you change the data type to I don't have mysql database and JDBC driver on my local to test this, but in theory it works this way. You may want to try to see if you can insert this timestamp '1111-01-01 00:00:01' OK. |
@huaxingao Actually, In real test: |
Hi, @srowen |
I think the point is, you would lose time zone information writing as DATETIME, right? |
Hi @srowen , The zone info is according to local zone of mysql, it seems we should not consider it as timezone loss because we have considered it when write. The different between DATETIME and TIMESTAMP is zone index rather than loss it and it belongs to mysql design. |
@TJX2014
Since user explicitly cast string to timestamp, I would think that the user wants to insert 1970-01-0 00:00:01 as a TimeStamp data type. Suppose the current time zone on mysql server is america/los_angeles. After the data is inserted to mysql, if the user changes the time zone setting |
@huaxingao |
What changes were proposed in this pull request?
org.apache.spark.sql.jdbc.MySQLDialect#getJDBCType
org.apache.spark.sql.test.SQLTestUtils#test
.Why are the changes needed?
Because write spark timestamp to mysql should has a '1000-01-01 00:00:00' to '9999-12-31 23:59:59' range.
see https://dev.mysql.com/doc/refman/5.7/en/datetime.html
While the date type in mysql should be datetime rather than timestamp.
Before this patch, when we use timestamp data type in mysql by auto created table:
sql("select cast('1111-01-01 00:00:01' as timestamp)").toDF("ts").write.mode("append").jdbc("jdbc:mysql://localhost:3306/test", "ts_test3",prop)
we will get an exception:
com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect datetime value: '1111-01-01 00:00:01' for column 'ts' at row
Does this PR introduce any user-facing change?
Yes, after this patch, people could insert '1000-01-01 00:00:00' to '9999-12-31 23:59:59' range timestamp to mysql DATETIME column rather than TIMESTAMP column because TIMESTAMP in mysql not have the same range in spark by table auto created.
How was this patch tested?
Unit test.