Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-3889] Make File Monitoring Function checkpointable. #1984

Closed
wants to merge 3 commits into from

Conversation

kl0u
Copy link
Contributor

@kl0u kl0u commented May 12, 2016

This pull request introduces the underlying functionality to make Streaming File Sources persistent.
It does not yet change the API calls, as this will be done after agreeing on the current architecture and
implementation.

In addition, this PR includes a commit for FLINK-3896. This allows an operator to cancel its container task. The need for this functionality came during a discussion with @StephanEwen and @aljoscha and it is a separate commit.

kl0u added 2 commits May 11, 2016 18:03
It adds a method failExternally() to the StreamTask, so that custom Operators
can make their containing task fail when needed.
This adds a new interface called CheckpointableInputFormat
which describes input formats whose state is queryable,
i.e. getCurrentChannelState() returns where the reader is
in the underlying source, and they can resume reading from
a user-specified position.

This functionality is not yet leveraged by current readers.
@kl0u
Copy link
Contributor Author

kl0u commented May 13, 2016

After the discussion we had today with @StephanEwen and @aljoscha , I also added the PROCESS_ONCE watchType which processes the current (when invoked) content of a file/directory and exits. This is to be able to accommodate bounded file sources (a la batch).

@@ -26,6 +26,11 @@
import org.apache.flink.core.memory.DataOutputView;

@Public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Public should go after Javadoc, IMHO. Also, the last line of the Javadoc has an extra *.

@aljoscha
Copy link
Contributor

Overall, the code looks very good! I had some inline comments about Javadoc/comments.

One thing that might be wrong, though is the interplay between CheckpointableInputFormat.getCurrentChannelState() and SplitReader.getReaderState(). In the latter to check whether the result of the former is null, and then do something based on this. The former, however will never return null but instead always returns a Tuple2 that can have fields set to null. This might be an artifact form an earlier mode of implementation.

@kl0u
Copy link
Contributor Author

kl0u commented May 17, 2016

Thanks for the comments @aljoscha .

The only comment not yet integrated is the one with the {{ OutputTypeConfigurable }} which I have to understand a bit better how to implement correctly. As for the spaces, it it intellij that add them. I will try to fix them also later.

In addition, and to clean up the JIRAs, I closed the FLINK-3808 issue, as now it is integrated into FLINK-3889, which corresponds to this PR.

This is meant to replace the different file
reading sources in Flink streaming. Now there is
one monitoring source with DOP 1 monitoring a
directory and assigning input split to downstream
readers.

In addition, it makes the new features added by
FLINK-3717 work together with the aforementioned entities
(the monitor and the readers) in order to have
fault tolerant file sources and exactly once guarantees.

This does not replace the old API calls. This
will be done in a future commit.
@kl0u kl0u closed this May 26, 2016
@kl0u kl0u deleted the ft_files branch June 6, 2016 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants