-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-5023][FLINK-5024] Add SimpleStateDescriptor to clarify the concepts #2768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| RocksDBKeyedStateBackend<K> backend) { | ||
|
|
||
| RocksDBKeyedStateBackend<K> backend | ||
| ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please stick with the original code style here
| if (valueBytes == null) { | ||
| return stateDesc.getDefaultValue(); | ||
| } | ||
| return valueSerializer.deserialize(new DataInputViewStreamWrapper(new ByteArrayInputStream(valueBytes))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is used only in single threaded settings, we should try to reuse the DataInputViewStreamWrapper and ByteArrayInputStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may help reduce the number of object allocation, but i think the brought benefits are fewer that brought by the re-usage of ByteArrayOutputStream. Because the byte array inside ByteArrayInputStream is provided by other objects.
What do you think?
| out.defaultWriteObject(); | ||
| } | ||
|
|
||
| private void readObject(final ObjectInputStream in) throws IOException, ClassNotFoundException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method is not necessary, because it only triggers default serialization anyways.
aljoscha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like these changes a lot! Very good work.
I had some inline comments, mostly about missing Javadoc. One thing I'd like to see changed is the additional V generic parameter in a lot of the state accessor methods, I don't think we need that one and I would like it if we don't add additional generic parameters if we don't have to.
| private transient TypeInformation<T> typeInfo; | ||
|
|
||
| /** The default value returned by the state when no other value is bound to a key */ | ||
| protected transient T defaultValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might think about renaming this because for FoldingState it's not a default value but the initial accumulation value. (Just a suggestion, not strictly necessary)
| * partitioned the returned value is the same for all inputs in a given | ||
| * operator instance. If state partitioning is applied, the value returned | ||
| * depends on the current operator input, as the operator maintains an | ||
| * independent state for each partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect, we don't maintain state by partition but by key. (This is not your fault, it was always like this on ValueState and you copied it from there, I just discovered it now when reading your code.)
| * | ||
| * @throws IOException Thrown if the system cannot access the state. | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add a @deprecated description in the Javadoc, pointing to use get().
| import java.util.Map; | ||
|
|
||
| /** | ||
| * Heap-backed partitioned states whose values are not composited and is snapshotted into files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the is snapshotted into files portion is not always true, and also not necessary here.
| */ | ||
| @SuppressWarnings({"rawtypes", "unchecked"}) | ||
| <N, S extends State> S getPartitionedState( | ||
| <N, V, S extends State<V>> S getPartitionedState( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need the additional V parameter. It's never used and State<?> works just as while while State<V> just pretends to add more type safety. Because both ways work I would prefer to go with the solution that doesn't add additional generic parameters to methods.
Same holds for mergePartitionedStates().
| */ | ||
| @Internal | ||
| abstract class AbstractQueryableStateOperator<S extends State, IN> | ||
| abstract class AbstractQueryableStateOperator<V, S extends State<V>, IN> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here the same thing I mentioned earlier holds, we don't really need the additional V parameter.
| * @throws Exception Thrown, if the state backend cannot create the key/value state. | ||
| */ | ||
| protected <S extends State> S getPartitionedState(StateDescriptor<S, ?> stateDescriptor) throws Exception { | ||
| protected <V, S extends State<V>> S getPartitionedState(StateDescriptor<S> stateDescriptor) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the same thing I mentioned on KeyedStateBackend holds, we don't need the additional V parameter.
| * function (function is not part os a KeyedStream). | ||
| */ | ||
| <S extends State> S getPartitionedState(StateDescriptor<S, ?> stateDescriptor); | ||
| <V, S extends State<V>> S getPartitionedState(StateDescriptor<S> stateDescriptor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as elsewhere: we don't need V.
d006d31 to
7062b58
Compare
|
@aljoscha Thanks for your review. I have updated the PR according to your suggestion. |
|
@shixiaogang Thanks for the swift update! The changes look very good now though I found one last thing that could be problematic. This is changing the behaviour of |
|
Oh... I added another field to make the code more clear, but I did not notice the serialization problem. Thanks very much for your reminder. Your solution does work though the concept of "defaultValue" in folding states is a little confusing. Another solution to let What do you think? @aljoscha |
|
@shixiaogang you can also do that, but then the machinery in |
|
I moved default value from I also modify the implementation of @aljoscha What do you think of these changes? |
|
Thanks for updating, @shixiaogang. I'll look at this today! For the change to reducing state it would be better to have this as a separate issue/commit/PR. @fhueske you already opened an issue for this, right? Could you please point us to that? |
e78264b to
e885c7d
Compare
|
I rebased the branch to resolve the conflicts with the master branch. |
e885c7d to
4666207
Compare
|
Despite the changes in the state descriptors, the Flink jobs can restore from old versioned snapshots now. |
|
Hi @shixiaogang ! I went through this pull request and below are a few thoughts. The pull request changes many things together. Some can work, and for others I would suggest to do it differently. Changing the
|
|
Update: Just saw that you already did the migration support. Will merge the |
|
I posted a rebased version of the state descriptor refactoring here: #3243 |
|
@StephanEwen Thanks a lot for your comments. Removing This change is suggested by @aljoscha who wants to let broadcast states share the same interface (see the discussion in FLINK-5023) . As mentioned, the broadcast states are read-only in some cases. Hence it's suggested not to provide the Changing the The |
|
In fact, my original suggestion was to add a new interface and leave all the existing interfaces as they are. That way we would have the least amount of changes in existing code and still have a common interface for state that can be read. (Note that |
|
@aljoscha That way, it's very confusing that a These changes (mainly the introduction of the But I prefer to rethink the state hierarchy in the near future because there exists too much duplicated code now. |
|
Close the pull request because the state descriptor now is refactored with the introduction of composited serializers (See FLINK-5790). |
Changes in the definition of
StateandStateDescriptor:get()in theStateinterface.StateDescriptors.SimpleStateDescriptorto simplify the construction ofValueStateDescriptor,ReducingStateDescriptorandFoldingStateDescriptor.KeyedStateBackendandAbstractKeyedStateBackendaccordingly.ListStateDescriptoraccordingly.Changes in HeapStateBackend:
AbstractHeapStatenot implement theStateinterface. Theclear()method now is removed fromAbstractHeapState.HeapSimpleStateto simplify the implementation ofHeapValueState,HeapReducingStateandHeapFoldingState.HeapValueState,HeapReducingStateandHeapFoldingStateaccordingly.Changes in RocksDBStateBackend:
AbstractRocksDBStatenot implement theStateinterface, removing theclear()method. Now,AbstractRocksDBStatedoes not depend on the types ofStateandStateDescriptorany more.RocksDBSimpleStateto simplify the implementation ofRocksDBValueState,RocksDBReducingStateandRocksDBFoldingState.RocksDBValueState,RocksDBReducingStateandRocksDBFoldingStateaccordingly.Others:
States in the implementation of window operators.States in unit tests.=== UPDATE ===
clear()method fromStateUpdatableState