New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-3889] Make File Monitoring Function checkpointable. #1984
Conversation
It adds a method failExternally() to the StreamTask, so that custom Operators can make their containing task fail when needed.
This adds a new interface called CheckpointableInputFormat which describes input formats whose state is queryable, i.e. getCurrentChannelState() returns where the reader is in the underlying source, and they can resume reading from a user-specified position. This functionality is not yet leveraged by current readers.
After the discussion we had today with @StephanEwen and @aljoscha , I also added the PROCESS_ONCE watchType which processes the current (when invoked) content of a file/directory and exits. This is to be able to accommodate bounded file sources (a la batch). |
@@ -26,6 +26,11 @@ | |||
import org.apache.flink.core.memory.DataOutputView; | |||
|
|||
@Public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Public
should go after Javadoc, IMHO. Also, the last line of the Javadoc has an extra *
.
Overall, the code looks very good! I had some inline comments about Javadoc/comments. One thing that might be wrong, though is the interplay between |
85688c5
to
c2c18e3
Compare
Thanks for the comments @aljoscha . The only comment not yet integrated is the one with the {{ OutputTypeConfigurable }} which I have to understand a bit better how to implement correctly. As for the spaces, it it intellij that add them. I will try to fix them also later. In addition, and to clean up the JIRAs, I closed the FLINK-3808 issue, as now it is integrated into FLINK-3889, which corresponds to this PR. |
This is meant to replace the different file reading sources in Flink streaming. Now there is one monitoring source with DOP 1 monitoring a directory and assigning input split to downstream readers. In addition, it makes the new features added by FLINK-3717 work together with the aforementioned entities (the monitor and the readers) in order to have fault tolerant file sources and exactly once guarantees. This does not replace the old API calls. This will be done in a future commit.
This pull request introduces the underlying functionality to make Streaming File Sources persistent.
It does not yet change the API calls, as this will be done after agreeing on the current architecture and
implementation.
In addition, this PR includes a commit for FLINK-3896. This allows an operator to cancel its container task. The need for this functionality came during a discussion with @StephanEwen and @aljoscha and it is a separate commit.