-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-27586][python] Support non-keyed co-broadcast processing #19878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
HuangXingBo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vancior Thanks a lot for the PR. I have left some comments.
| JPythonBroadcastStateTransformation = ( | ||
| gateway.jvm.org.apache.flink.streaming.api.transformations.python | ||
| ).PythonBroadcastStateTransformation | ||
| j_state_names = ListConverter().convert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the typeinfo set in MapDescriptor have any effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no, but the base class of PythonBroadcastStateTransformation, AbstractBroadcastStateTransformation needs this for constructor, so this's just for expected behavior of the base class, althrough in PyFlink we are not restricting which broadcast states can be accessed during runtime.
| serialized_fn.map_state_write_cache_size, | ||
| ) | ||
| else: | ||
| operator_state_backend = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When serialized_fn is not a instance of flink_fn_execution_pb2.UserDefinedDataStreamFunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only DataStream API can use operator state, so I guess that's okay
...rg/apache/flink/streaming/api/transformations/python/PythonBroadcastStateTransformation.java
Outdated
Show resolved
Hide resolved
| raise Exception('Cannot override partitioning for KeyedStream.') | ||
|
|
||
| def broadcast(self) -> 'DataStream': | ||
| def broadcast(self, *args) -> Union['DataStream', 'BroadcastStream']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not going to support broadcast on keyedstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's will be another PR.
| def broadcast(self, *args) -> 'BroadcastStream': | ||
| pass | ||
|
|
||
| def broadcast(self, *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the typehint too?
| pass | ||
|
|
||
| @overload | ||
| def broadcast(self, *args) -> 'BroadcastStream': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the typeint MapStateDescriptor?
| raise Exception('Cannot override partitioning for KeyedStream.') | ||
|
|
||
| def broadcast(self) -> 'DataStream': | ||
| def broadcast(self, *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| def connect(self, ds: 'BroadcastStream') -> 'BroadcastConnectedStream': | ||
| pass | ||
|
|
||
| def connect(self, ds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the typeint Union['DataStream', 'BroadcastStream']?
25d5966 to
4404d2a
Compare
HuangXingBo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vancior I have only one comment. And I think you can rebase master and push again.
4404d2a to
f98551b
Compare
What is the purpose of the change
This PR implements non-keyed co-broadcast processing interfaces, including
BroadcastStream,BroadcastConnectedStreamandBroadcastProcessFunction.Brief change log
BroadcastStream,BroadcastConnectedStream,BroadcastProcessFunctionand related internal classesCO_BROADCAST_PROCESSfunction type for broadcast processingPythonBroadcastStateTransformationand corresponding translator to translate python broadcast process into different operators for batch or streamingVerifying this change
This change added tests and can be verified as follows:
test_co_broadcast_processin test_data_stream.pyDoes this pull request potentially affect one of the following parts:
@Public(Evolving): (yes)Documentation