New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-11010] [TABLE] Flink SQL timestamp is inconsistent with currentProcessingTime() #7180
Conversation
more discussion details, please read Question about Timestamp in Flink SQL minimal reproducible example
result
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contirbution @lamber-ken , can you kindly add test to justify your change please?
Thanks for review @walterdd. Yes, Ok, I'm very glad to. read more about Calcite SqlFunctions#internalToTime. and if I give it an event time with unix timestamp 0, then I got the Timestamp(-28800000). I am confused why |
Hi @lamber-ken . the minimum reproduction example should be a good starting point for adding test/ITCase to justify your change. I also commented on the JIRA ticket as well. |
hi, @walterddr, I had add sql proctime test. |
hi, @tillrohrmann, the ci build failed, but I can't find something helpful for me from log, how to rebuild ? |
let me show how to generate the wrong result background: processing time in tumbling window flink:1.5.0 the invoke stack is as follows: now ,we are at [1] org.apache.calcite.runtime.SqlFunctions.internalToTimestamp (SqlFunctions.java:1,747) and the code is as follows: after this, come back to then,we will execute if (windowEndOffset.isDefined) { before execute,the output is so,do you think the I am in China, I think the timestamp should be 2018-12-06 13:40:30.0 and 2018-12-06 13:40:33.0 okay,let us continue now ,the data will be written to kafka,before write ,the data will be serialized the call stack is as follows: and the code is as follows: here,use windowEnd for example,the value is see,the initial value is 1544074833000 and the final value is 1544046033000 the minus value is 28800000, ---> 8 hours ,because I am in China.why? the key reason is SqlFunctions.internalToTimestamp in the code, It minus the LOCAL_TZ , I think it is redundant! |
@lzqdename, thanks for your detail description. |
+1, from my side. I had the same problem. thanks |
Hi @lamber-ken sorry for the late response. I am a bit confuse with the JIRA ticket description and the PR comments. I am under the impression that the goal should make all internal times GMT-based regardless of where the code runs. Thus |
@walterddr, you are welcome. but calcite implementation of the method as below,
|
In the Flink when Long ->Timestamp,use code as follows:[In SqlFunctions] when Timestamp->Long,use code as follows:[In SqlFunctions] In the final write step, now ,let me think, we use two different class to handle the transformation between Long and Timestamp, It is not consitent when handling time transformation! |
@lamber-ken thanks, it looks good to me. Our servers are in |
i got the same problem too: I got eventTime as 2018-01-01 17:13:30.0, which is 8hours delayed |
hi, @zentol cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lamber-ken thanks for the explanation. I am still concern whether this is the best solution to the problem by working backwards from Calcite. I tagged https://issues.apache.org/jira/browse/FLINK-8353 as a more well structured plan to add timezone support to FLINK.
Timestamp procTimestamp = (Timestamp) value.getField(0); | ||
|
||
// validate the second here | ||
long procSecondTime = procTimestamp.getTime() / 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes me some trouble because cases can happen when they cross the second boundary. this is not a stable ITCase in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's better to validate hour. I'll update.
@walterddr, as the https://issues.apache.org/jira/browse/FLINK-8353 desc.
|
@walterddr could you please check the timezone problem when dealing with eventtime? |
FLINK-11010 only reports bugs for utilizing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lamber-ken Thanks for looking into the problem and bring the discussions.
I'm afraid we can't add the offset simply. I have answered your question in the user mailing list. Below is a copy:
The two methods(add offset and minus offset) do not have to appear in pairs. Currently, Flink doesn't support time zone configuration. The timestamp(time of type Timestamp) always means the time in UTC+0. So in the test of your pr[1], the output timestamp means a time in UTC+0, instead of a time in your timezone. You can verify it by changing your sql to:
String sqlQuery = "select proctime, LOCALTIMESTAMP from MyTable";
But you raised a good question and it is true that it would be better to support time zone configuration in Flink. For example, provide a global timezone configuration. However, it is not a one or two lines code change. We need to take all operators into consideration. And it is better to solve it once for all.
Best, Hequn
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
Show resolved
Hide resolved
hi, @hequn8128, tableEnv can set timezone like bellow code. but just only set the timezone, not apply to flink system. yes, it is better to solve it once for all, not just add the offset simply.
Or it can just add the offset simply now, and then redesign how to solve this problem(for example, a global timezone configuration). because it can solve the problem currently. |
@lamber-ken Yes, you are right. It would be good if we can solve it in a simple way. However, you pr would bring some side effects. Basically, Flink returns timestamp in UTC+0. With your pr, the output time is a local timestamp for proctime. However, you only changed one place about how to represent a timestamp. There are a lot of other places still return timestamp in UTC+0 which would bring confusion and inconsistency. Flink also returns and only returns local timestamp in functions that clearly indicate that the time of the return type is a local timestamp, such as the Best, Hequn |
@hequn8128, Ok, I see. thanks |
Besides the 'select proctime' case, windowStart and windowEnd fields also have this problem when using processing-time window.
|
What is the purpose of the change
the ProcessingTime is just implemented by invoking System.currentTimeMillis() but the long value will be automatically wrapped to a Timestamp with the following statement:
new java.sql.Timestamp(time - TimeZone.getDefault().getOffset(time));
Brief change log
org.apache.flink.table.codegen.CodeGenerator#generateProctimeTimestamp
add Default TimeZone#offset
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation