You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a project that deals with high volume of twitter data. Kafka is used for queuing tweets for later processing. I have two metrics of interest - one that shows how much time it takes for a tweet to come in the system before being put to Kafka and a second that displays the total time it takes from creating a tweet to storing it in a db after it is fetched from Kafka. The first one is pretty much constant. The second however is slowly rising.
I'm investigating as to what the cause could be and I ended up with the growing number of tweets in Kafka as the probable reason. And it kind of makes sense since the time displayed by the second metric is proportional to the increasing number of tweets. It could be both the way the kafka is setup and the way the I'm reading from the queue.
Something that may be of interest is that I have a implemented a simple partitionizer for the Producer because I need the tweets in the order they are created with respect to a particular conversation.
private def createSupervisedSubscriberActor() = {
val kafka = new ReactiveKafka()
val subscriberProperties = ProducerProperties(
brokerList = config.getStringList("kafka.brokers").toList.mkString(","),
topic = config.getString("kafka.topic"),
clientId = config.getString("kafka.clientId"),
encoder = new StringEncoder(),
partitionizer
)
val subscriberActorProps: Props = kafka.producerActorProps(subscriberProperties)
context.actorOf(subscriberActorProps, subscriber)
}
private def partitionizer: String => Option[Array[Byte]] = (s: String) => Option(s.getBytes)
On the consumer side I am simply reading from the queue. I expect that I'm just getting the latest message every time something is pushed to the end of the queue. Am I right? Can you suggest something that I could do to optimize my reading from Kafka? How does the size of the queue effect the way the consumer reads from it?
Please let me know if you are missing any info.
Thanks for the help!
The text was updated successfully, but these errors were encountered:
I have a project that deals with high volume of twitter data. Kafka is used for queuing tweets for later processing. I have two metrics of interest - one that shows how much time it takes for a tweet to come in the system before being put to Kafka and a second that displays the total time it takes from creating a tweet to storing it in a db after it is fetched from Kafka. The first one is pretty much constant. The second however is slowly rising.
I'm investigating as to what the cause could be and I ended up with the growing number of tweets in Kafka as the probable reason. And it kind of makes sense since the time displayed by the second metric is proportional to the increasing number of tweets. It could be both the way the kafka is setup and the way the I'm reading from the queue.
Something that may be of interest is that I have a implemented a simple partitionizer for the Producer because I need the tweets in the order they are created with respect to a particular conversation.
On the consumer side I am simply reading from the queue. I expect that I'm just getting the latest message every time something is pushed to the end of the queue. Am I right? Can you suggest something that I could do to optimize my reading from Kafka? How does the size of the queue effect the way the consumer reads from it?
Please let me know if you are missing any info.
Thanks for the help!
The text was updated successfully, but these errors were encountered: