How to store processed log to Kafka or Cassandra with Manual Commit? #51

evans-ye · 2015-11-17T10:23:09Z

Sorry I can't find a proper place to ask so I just open up an issue here (Close it if you think it's improper).

I'm currently doing a PoC based on this awesome reactive kafka module with the manual commit feature.
And I'm struggling adding a Sink to store log into some permanent storage system such as Kafka or Cassandra.
From your sample code messages are being processed on-the-fly in processMessage function, if I need to store data into Kafka, then I need to replace offsetCommitSink to another Kafka Sink, but in that way I can use offsetCommitSink to stream back for commit.
Another approach is to use a saveToKafka function to store processed log into Kafka(Shown in below), which is the current implementation of my PoC.

Source(consumerWithOffsetSink.publisher)
.map(processMessage()) // your message processing
.map(saveToKafka())
.to(consumerWithOffsetSink.offsetCommitSink) // stream back for commit
.run()

Do you think this is the best practice to achieve my goal?

kciesielski · 2015-11-17T10:31:33Z

@evans-ye Have you considered using Broadcast from Akka Streams' DSL? It allows "forking" the stream so that you could try to send your processed messages to two "branches": your topic where you save and the commit Sink.
Example: http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-quickstart.html

evans-ye · 2015-11-17T10:53:52Z

Yeah, thanks for your prompt response.
CMIIW, my thought is when using broadcast, there's no guarantee that commit will be executed after data has been saved to my topic, hence there's a chance to lost data.

Let's say the stream saving data to my topic is slower, and the committing stream is faster.
When failure happened at the point that offsets get committed back, but the other stream saving data to my topic is still under processing, the data under processing is lost because when next time we start to fetch offset, the offset has been set latter.

jasongilanfarr · 2015-11-17T16:11:58Z

You definitely can do this (and I do) by using a broadcast and some flow stages so you'll commit only after the persist to Cassandra future completes. As a side benefit, this pseudo flow only sends the message itself to your persist method.

Source(publisher) ~> broadcast ~> Flow.map(.message).mapAsync(1)(persist) ~ zip.in0
broadcast ~> zip.in1
zip.out ~> Flow.map(.2) ~> commitSink

evans-ye · 2015-11-17T17:17:49Z

OKAY! I got your point. This looks perfect to me.
Thank you so much for the great help and the great module!

I only have one more question, which I think is the benefit using your module:
Comparing the reactive kafka with akka streams + akka persistence, which both provide at-least-once message guarantee, reactive kafka is way much efficient since we only write back offsets, while akka streams + akka persistence needs to persist every log down to disk. Am I correct?

13h3r · 2016-04-30T12:46:17Z

Looks like we done with this

kciesielski mentioned this issue Nov 22, 2015

How to use it as flow part instead of sink #50

Closed

aditanase mentioned this issue Dec 8, 2015

How to scale out reading from partitions via akka cluster? #58

Closed

13h3r mentioned this issue Feb 6, 2016

[WIP] Alternative graph stages implementation for consumer and producer #72

Merged

13h3r closed this as completed Apr 30, 2016

ennru added this to the invalid milestone Jun 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to store processed log to Kafka or Cassandra with Manual Commit? #51

How to store processed log to Kafka or Cassandra with Manual Commit? #51

evans-ye commented Nov 17, 2015

kciesielski commented Nov 17, 2015

evans-ye commented Nov 17, 2015

jasongilanfarr commented Nov 17, 2015

evans-ye commented Nov 17, 2015

13h3r commented Apr 30, 2016

How to store processed log to Kafka or Cassandra with Manual Commit? #51

How to store processed log to Kafka or Cassandra with Manual Commit? #51

Comments

evans-ye commented Nov 17, 2015

kciesielski commented Nov 17, 2015

evans-ye commented Nov 17, 2015

jasongilanfarr commented Nov 17, 2015

evans-ye commented Nov 17, 2015

13h3r commented Apr 30, 2016