New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-3659] Allow ConnectedStreams to Be Keyed on Only One Side #1831
Conversation
Thanks Aljoscha, this seems to work 👍 |
I think you can go ahead merging this if no-one has any objections :) |
Can someone elaborate on the semantics? I am against merging something that changes semantics and has zero description. |
For example, how does keyed state work for the input side that is not key partitioned? How is the key found? How is partitioning guaranteed? |
This PR does not change the behaviour of any existing Flink applications. It now allows though that users only specify key of one input of the comapfunctions for instance: This was previously impossible which probably blocks many existing use-cases (actually blocking one of my own applications that I try to build on Flink) where one input does not have associated state. After this change the key-value state defined by the keyed input stream works as expected. The only not so fortunate behaviour is that users can still call state.value() for inputs from the non-keyed stream and the behaviour is not clearly defined. If there was already input from the other side it returns the state for the last key, otherwise it will probably throw a nullpointer exception. I think this is acceptable behaviour for the time being because well written programs will work as expected. We can think about how we want to handle the other non-keyed input but that will probably include changing many things in the KvBackends so they can do this properly. This problem already exists in flink as state access outside of the processing method is not well defined. |
To elaborate on this. State right now works well if you stick to the (admittedly somewhat hidden) rules. That is, you should only access state if there is a key available. If there is no key available the behavior changes in unexpected ways based on what state backend is used and the capabilities of the key serializer. For example, let's look at access to For these reasons I would like to change the semantics of state such that the user always has to call |
I like @aljoscha's idea to separate more explicitly the user state access and it's implementation. Having an accessor would also allow us to get rid of the swapping of the actual state objects which are wrapped by the |
Thanks for describing this. This has quite some big implications, as far as I can see it. The state in the connected stream is now a "broadcast state" not partitioned, so allowing to do that on the key/value state probably breaks some ongoing efforts, like scaling, etc. How about a more clean separation of these things:
That gives us
|
Yes, I'll try and come up with Ideas in that direction then. 👍 |
Closing this for now... |
No description provided.