[FLINK-6719] [docs] Add details about fault-tolerance of timers to ProcessFunction docs#5887
[FLINK-6719] [docs] Add details about fault-tolerance of timers to ProcessFunction docs#5887bowenli86 wants to merge 2 commits intoapache:masterfrom bowenli86:FLINK-6719
Conversation
|
Looks good @bowenli86 |
fhueske
left a comment
There was a problem hiding this comment.
Thanks for extending and improving the documentation about timer @bowenli86!
I've made a few comments and suggestions.
Best, Fabian
| ### Timer Coalescing | ||
| ### Optimizations - Timer Coalescing | ||
|
|
||
| Every timer registered at the `TimerService` via `registerEventTimeTimer()` or |
There was a problem hiding this comment.
Move the first paragraph under the ## Timer section
There was a problem hiding this comment.
Also it would be great if you could find a good spot to add a note that calls to processElement() and onTimer() are always synchronized, i.e., users do not have to worry about concurrent modification of state.
| </div> | ||
| </div> | ||
|
|
||
| ### Fault Tolerance |
There was a problem hiding this comment.
Move the ###Fault Tolerance section above the ###Optimizations section
| Timers registered within `ProcessFunction` are fault tolerant. | ||
|
|
||
| Timers registered within `ProcessFunction` will be checkpointed by Flink. Upon restoring, timers that are checkpointed | ||
| from the previous job will be restored on whatever new instance is responsible for that key. |
There was a problem hiding this comment.
Add a note that timers are synchronously checkpointed (regardless of the configuration of the state backend). Hence, a large number of timers can significantly increase checkpointing time. See optimizations section for advice to reduce the number of timers.
|
|
||
| For processing timer timers, note that the firing time of a timer is an absolute value of when to fire. | ||
|
|
||
| What this means is that if a checkpointed timer’s firing processing timestamp is t (which is basically the registering |
There was a problem hiding this comment.
(which is basically the registering time + configured trigger time)
This is often the case, but not necessarily true. Esp. for processing time, the timer can also be set to something completely different. I'd remove this to avoid confusion.
| For processing timer timers, note that the firing time of a timer is an absolute value of when to fire. | ||
|
|
||
| What this means is that if a checkpointed timer’s firing processing timestamp is t (which is basically the registering | ||
| time + configured trigger time), then it will also fire at processing timestamp t on the new instance. Therefore, you |
There was a problem hiding this comment.
What do you mean by new instance? Are you discussing the scenario when a task is recovered on a different machine? I don't think we need to mention this. It should be quite clear that clock synchronization is an issue in processing time.
The info that a pt-timer fires on restore if the time passed while the job was down is important. Also mention that this is true for savepoint, which is even more critical because more time may pass between taking and restoring from a savepoint.
|
|
||
| #### Event Time Timers | ||
|
|
||
| For event time timers, given that Flink does not checkpoint watermarks, a restored event time timer will fire when the |
There was a problem hiding this comment.
The fact that Flink doesn't checkpoint watermarks is not really related to and does not affect the behavior of timers. It is useful information but I don't think we need to mention it here.
It's sufficient to mention that et-timer fire when the wm passes them.
|
@fhueske updated! let me know how it looks now |
|
|
||
| Timers registered within `ProcessFunction` are fault tolerant. They are synchronously checkpointed by Flink, regardless of | ||
| configurations of state backends. (Therefore, a large number of timers can significantly increase checkpointing time. See optimizations | ||
| section for advice to reduce the number of timers.) |
There was a problem hiding this comment.
See the optimizations section for advice on how to reduce the number of timers.
|
Thanks for the update @bowenli86. I'll merge the PR later. |
What is the purpose of the change
The fault-tolerance of timers is a frequently asked questions on the mailing lists. We should add details about the topic in the ProcessFunction docs.
Brief change log
Added details about the topic in the ProcessFunction docs.
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
none
Documentation
none