Experiments with exactly-once streaming (using git semantics).
Flink's Kafka connector provides exactly-once guarantees when acting as a source (consumer) but not as a sink (producer) (reference). While a Kafka source may rewind at ease to the offset tracked in the checkpoint state in the event of failure, Kafka provides no way to undo any records produced and thus rewind the sink. This limitation invites the question of how to extend Kafka (or a similar system) to provide exactly-once guarantees for a Kafka sink. Since Kafka is envisioned as a commit log, may an answer be found in commit log concepts? This repository explores that possibility.
Git provides a useful conceptual framework for the investigation, since its concepts are familiar and it is easily programmable with jgit. The flink-git repository is thus an experimental connector, based on jgit, that explores providing exactly-once guarantees as both a source and as a sink.
Not intended for real applications.
Please use the wiki for discussion.