-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-31185][Python] Support side-output in broadcast processing #22003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
public <T> DataStreamPythonFunctionOperator<T> copy( | ||
DataStreamPythonFunctionInfo pythonFunctionInfo, | ||
TypeInformation<T> outputTypeInfo) { | ||
throw new RuntimeException("This should not be invoked on a DelegateOperator!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we should implement this. This method is used when performing operator chain optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently python broadcast opreator does not support chaining with downstream, so I guess this check could be remove in another PR that supports broadcast chaining.
|
||
SimpleOperatorFactory<OUT> getOperatorFactory(); | ||
|
||
static void configureDelegatedOperator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming it to configure
or something else? As in this method, we are actually config operator
instead of delegateOperator
.
yield value[1] | ||
yield tag, value[0] | ||
|
||
self.env.set_parallelism(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary as the parallelism is 2 by default. Refer to PyFlinkStreamingTestCase for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of an "explict reminder" that tells we expect the output result should match parallelism=2 with some elements duplicated.
What is the purpose of the change
This PR supports using side-output functionality in broadcast processing case in pyflink.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
test_co_broadcast_side_output
andtest_keyed_co_broadcast_side_output
in test_data_stream.pyDoes this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation