New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-18232][hive] Fix Hive streaming source bugs #12573
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit afc18c9 (Wed Jun 10 07:34:42 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
URI uri = fileSplit.getPath().toUri(); | ||
long length = fileSplit.getLength(); | ||
// Hadoop FileSplit should not have -1 length. | ||
if (length == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the doc of FileInputSplit::getLength()
to indicate length == -1 means to read all data from the file? I'll feel more comfortable about this change if it's guaranteed by the API contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there are comments in FileInputSplit
: the number of bytes in the file to process (-1 is flag for "read whole file")
} | ||
|
||
@VisibleForTesting | ||
static List<Tuple2<List<String>, Long>> suitablePartitions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some comments for this method? What is a suitable partition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one minor comment
What is the purpose of the change
HiveTableSource.createStreamSourceForNonPartitionTable
should use local zone mills instead of UTC mills becauseContinuousFileMonitoringFunction
use local zone mills.Verifying this change
HiveTableSourceITCase.testStreamPartitionRead
HiveTableFileInputFormatTest
DirectoryMonitorDiscoveryTest
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation