New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-13811][python] Support converting flink Table to pandas DataFrame #12148
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 634970c (Thu May 14 09:46:02 UTC 2020) ✅no warnings Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
@dianfu Is arrow a dependency of pyflink ? I mean whether arrow will be installed when running |
Yes, it will be installed when running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dianfu Thanks for your PR! Looks good overall, just some minor comments.
|
||
## Convert PyFlink Table to Pandas DataFrame | ||
|
||
It also supports to convert a PyFlink Table to a Pandas DataFrame. Internally, it will materialize the results of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supports to convert -> supports converting?
## Convert PyFlink Table to Pandas DataFrame | ||
|
||
It also supports to convert a PyFlink Table to a Pandas DataFrame. Internally, it will materialize the results of the | ||
table and serialize them into multiple arrow batches of Arrow columnar format at client side. The maximum arrow batch size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow -> Arrow?
|
||
def load_from_iterator(self, itor): | ||
class IteratorIO(io.RawIOBase): | ||
def __init__(self, itor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add super().init() method?
|
||
private static boolean isBlinkPlanner(Table table) { | ||
TableEnvironment tableEnv = ((TableImpl) table).getTableEnvironment(); | ||
if (tableEnv instanceof BatchTableEnvironment || tableEnv instanceof TableEnvImpl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tableEnv instanceof BatchTableEnvironment
is unnecessary.
// The SelectTableSink of blink planner will convert the table schema and we | ||
// need to keep the table schema used here be consistent with the converted table schema | ||
TableSchema convertedTableSchema = SelectTableSinkSchemaConverter.changeDefaultConversionClass(table.getSchema()); | ||
DataFormatConverters.DataFormatConverter converter = DataFormatConverters.getConverterForDataType(convertedTableSchema.toRowDataType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break this line as it is too long?
def test_to_pandas(self): | ||
table = self.t_env.from_pandas(self.pdf, self.data_type) | ||
result_pdf = table.to_pandas() | ||
self.assertTrue(2, len(result_pdf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertEqual?
@WeiZhong94 Thanks for the review. The test failures are instable cases of Kafka which are not related to this PR. Merging... |
What is the purpose of the change
This pull request add support to convert flink Table to pandas DataFrame
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation