-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-17013][python] Support Python UDTF in old planner under batch mode #11668
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 48fe02a (Wed Apr 08 08:58:21 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HuangXingBo Thanks a lot for the PR. Some feedback about the code duplication problem. Other parts looks good overall. Besides, it would be great if you can rebase to the master. I will take a closer look then. Thank you
@@ -81,6 +81,42 @@ class PyFlinkBlinkStreamUserDefinedFunctionTests(UserDefinedTableFunctionTests, | |||
pass | |||
|
|||
|
|||
class PyFlinkBatchUserDefinedTableFunctionTests(PyFlinkBatchTableTestCase): | |||
|
|||
def test_table_function(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking to avoid these code duplications. Most parts of the code are the same, we only have to extract the different parts into methods and override the methods in the child classes. For example, we can add a base method named get_output
in the UserDefinedTableFunctionTests
and override the method in PyFlinkBatchUserDefinedTableFunctionTests
.
* The physical rule is responsible for convert {@link FlinkLogicalCorrelate} to | ||
* {@link DataSetPythonCorrelate}. | ||
*/ | ||
public class DataSetPythonCorrelateRule extends ConverterRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some base classes for the two classes, i.e., DataSetPythonCorrelateRule
and DataStreamPythonCorrelateRule
to avoid the code duplications.
48fe02a
to
7d2e31f
Compare
Thanks a lot for @hequn8128 review, I have addressed the comments at the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HuangXingBo Thanks a lot for the update. The code looks much better and in a good shape now.
def _register_table_sink(self, field_names: list, field_types: list): | ||
table_sink = source_sink_utils.TestAppendSink(field_names, field_types) | ||
self.t_env.register_table_sink("Results", table_sink) | ||
|
||
def _get_output(self, t): | ||
t.insert_into("Results") | ||
self.t_env.execute("test") | ||
return source_sink_utils.results() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making this two as the default implementation in the base class and only override them in the PyFlinkBatchUserDefinedTableFunctionTests
?
@@ -39,16 +40,14 @@ | |||
import scala.Some; | |||
|
|||
/** | |||
* The physical rule is responsible for convert {@link FlinkLogicalCorrelate} to | |||
* {@link DataStreamPythonCorrelate}. | |||
* The abstract physical rule base is responsible for convert {@link FlinkLogicalCorrelate} to physical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responsible for converting
/** | ||
* The factory is responsible to creating {@link DataStreamPythonCorrelate}. | ||
* The abstract factory is responsible to creating {@link DataSetPythonCorrelate} or {@link DataStreamPythonCorrelate}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is responsible for creating
import scala.Option; | ||
|
||
/** | ||
* The physical rule is responsible for convert {@link FlinkLogicalCorrelate} to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responsible for converting
} | ||
|
||
/** | ||
* The factory is responsible to creating {@link DataStreamPythonCorrelate}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responsible for creating
import scala.Option; | ||
|
||
/** | ||
* The physical rule is responsible for convert {@link FlinkLogicalCorrelate} to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responsible for converting
} | ||
|
||
/** | ||
* The factory is responsible to creating {@link DataSetPythonCorrelate}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responsible for creating
/** | ||
* The factory is responsible to creating {@link DataStreamPythonCorrelate}. | ||
*/ | ||
private static class DataStreamPythonCorrelateFactory extends PythonCorrelateFactoryBase{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a blank here, i.e., PythonCorrelateFactoryBase {
|
||
@Override | ||
public void bufferInput(Row input) { | ||
// always copy the input Row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more details that why we always copy the input Row. What do you think?
Thanks a lot for @hequn8128 review, I have addressed the comments at the latest commit. |
@HuangXingBo Thanks a lot for the update. Merging... |
What is the purpose of the change
This pull request will support Python UDTF in old planner under batch mode
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation