New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-5929] Allow Access to Per-Window State in ProcessWindowFunction #3479
Conversation
Right now, the state that a WindowFunction or ProcessWindowFunction can access is scoped to the key of the window but not the window itself. That is, state is global across all windows for a given key. For some use cases it is beneficial to keep state scoped to a window. For example, if you expect to have several Trigger firings (due to early and late firings) a user can keep state per window to keep some information between those firings.
Thanks @sjwiesman! I'll have a look. |
One quick initial remark: instead of each time having an anonymous inner class for the
Using it would looks something like this:
The |
|
||
@Override | ||
public KeyedStateStore windowState() { | ||
if (windowAssigner instanceof MergingWindowAssigner) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking every time you could initialise the WindowContext
with either a WindowStateStore
or the (exception throwing) MergingStateStore
at the beginning.
|
||
@Override | ||
public KeyedStateStore globalState() { | ||
if (windowAssigner instanceof MergingWindowAssigner) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think access to global state is fine for merging windows.
@@ -39,5 +40,31 @@ | |||
* @param out A collector for emitting elements. | |||
* @throws Exception The function may throw exceptions to fail the program and trigger recovery. | |||
*/ | |||
@Deprecated | |||
void apply(KEY key, W window, IN input, Collector<OUT> out) throws Exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aljoscha I meant to ask, should I leave this method or remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I meant to actually write that earlier. Yes: please remove. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed an issue when removing apply. The method is used inside of AccumulatingKeyedTimePanes which takes in an AbstractStreamOperator as an argument to its evaluateWindow method. When creating the context I can get the global keyed state backend from the operator, but not the partitioned state because those methods are protected. Now the only two uses of this class are its subclasses which have both been deprecated. My question is, do you think I should modify the evaluateWindow method to accept a keyed state store which wraps the operator partitioned state or just throw an exception on context.windowState() because all valid uses of this method have been deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now it's OK to throw an Exception here.
…emove InternalWindowFunction#apply
@aljoscha I made the changes you asked for. Just a heads up, there are a number of files that were superficially changed when migrating from apply -> process but are otherwise untouched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks almost done now! 👍
I had some comments about leftover inner anonymous class contexts that were not yet moved over to the reusable context class.
@@ -153,6 +161,8 @@ | |||
|
|||
protected transient Context context = new Context(null, null); | |||
|
|||
protected transient WindowContext windowContext = new WindowContext(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to make it more clear what they do we should rename these two contexts to triggerContext
and processContext
.
@@ -628,6 +644,123 @@ protected final boolean isCleanupTime(W window, long time) { | |||
return time == cleanupTime(window); | |||
} | |||
|
|||
public abstract class KeyedStateStoreWithWindow implements KeyedStateStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe comment that we have a base class where we can set the window so that we can once create a MergingKeyStore
or WindowPaneKeyStore
(depending on the window assigner) and then not care about the distinction anymore.
I remember I suggested this but now struggled to see why there is the base class. 😉
@Override | ||
public void clear(final W window, final InternalWindowContext context) throws Exception { | ||
ProcessAllWindowFunction<V, R, W> wrappedFunction = this.wrappedFunction; | ||
final ProcessAllWindowFunction<V, R, W>.Context ctx = wrappedFunction.new Context() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover anonymous inner Context
. This should also use this.ctx
like in process()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops 😱
@@ -92,12 +93,45 @@ public void process(K key, final Context context, Iterable<T> values, Collector< | |||
result = foldFunction.fold(result, val); | |||
} | |||
|
|||
windowFunction.process(key, windowFunction.new Context() { | |||
ProcessWindowFunction<ACC, R, K, W>.Context ctx = windowFunction.new Context() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can benefit from a similar refactoring as the internal window functions, i.e. creating an internal context class instead of the anonymous inner classes.
This also holds for FoldApplyProcessAllWindowFunction
, ReduceApplyProcessAllWindowFunction
and ReduceApplyProcessWindowFunction
.
sorry for the delay, things got crazy at work. let me know if there are any issues. |
Don't worry. 😃 Is it ready for another review pass now? |
It looks like when I rebased on master I broke one of the scala side outputs test. I'm going to push a fix right now but it won't change any of the code surrounding this pr. |
Ok, please ping me when you pushed the fix. 😃 |
Pushed the fix, I had to update SideOutputsITCase so the ProcessAllWindowFunctions had a noop clear method. All tests passed locally, take a look |
Thanks! |
Thanks for implementing this! 😃 I just merged, could you please close this PR? |
Done! Thank you for for helping me get this feature merged in. This has to be one of the most painless commits I've ever made to an open source project of this size. |
Hehe, thanks! 😄 |
Right now, the state that a WindowFunction or ProcessWindowFunction can
access is scoped to the key of the window but not the window itself.
That is, state is global across all windows for a given key.
For some use cases it is beneficial to keep state scoped to a window.
For example, if you expect to have several Trigger firings (due to early
and late firings) a user can keep state per window to keep some
information between those firings.
@aljoscha