New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting timestamp of a specific offset #1187

Closed
Xaelias opened this Issue Aug 28, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@Xaelias

Xaelias commented Aug 28, 2017

Hi folks!

I'm trying to achieve something that I feel like should be simple. And usually one of 2 things happen:

  • the world teaches me the hard way that it was not a simple thing
  • I'm too dumb to figure it out on my own :-D

Anyway, trying to do some monitoring for our kafka clusters. We obviously monitor the lag. And one thing that we are missing, is the answer to the question "how old is the message we haven't yet consumed from that queue". Since 0.10 has an embedded timestamp, I figured this was probably easy. Well I'm asking you guys to tell me if it isn't, or if I'm just dumb :-D

My approach to this is: knowing that for topic , and group , committed offset is , let's try to do this:

import kafka

kafka_host="localhost:9092"
topic="topic"
offset=21

client = kafka.KafkaClient(kafka_host)

payload = kafka.common.FetchRequestPayload(topic, 0, offset, 100000000)
response = client.send_fetch_request(payloads=[payload])[0]

print(response)

Alas... although I can get the timestamp from the console consumer, or even the KafkaConsumer when consuming these messages, whatever I tried, this always return a timestamp==None.

Is this expected?
I tried to look at the API in Apache wiki, and my first conclusion is that although the response always contains the field timestamp, it's not populated for this specific call?

Thanks guys!
Alexis

@tvoinarovskyi

This comment has been minimized.

Collaborator

tvoinarovskyi commented Aug 29, 2017

You are getting the problem because that API is deprecated and it does not support the needed version of the request to return the timestamp.

Try something like:

import kafka
from kafka.errors import OffsetOutOfRangeError

kafka_host = "localhost:9092"
topic = "topic"
partition = 0
offset = 1

consumer = kafka.KafkaConsumer(
    bootstrap_servers=kafka_host,
    group_id=None,              # We don't need group management
    auto_offset_reset="none",   # Don't reset offset if it's incorrect
    consumer_timeout_ms=500     # timeout for next() call below
)
tp = kafka.TopicPartition(topic, 0)

consumer.assign([tp])
consumer.seek(tp, offset)

try:
    msg = next(consumer)
except (StopIteration, OffsetOutOfRangeError):
    print("Message does not exist")
else:
    print(msg.timestamp)
@Xaelias

This comment has been minimized.

Xaelias commented Aug 29, 2017

That's what I feared.
Yeah using a consumer works. But it's awfully heavyweight for something I want to do on hundreds of topics and thousands of partitions...
Well... I guess I'll write the code and see how bad (or well, maybe I'm just pessimistic :-D) it performs :/

Thanks for the help :-)

@Xaelias Xaelias closed this Aug 29, 2017

@akashpagar1995

This comment has been minimized.

akashpagar1995 commented Nov 9, 2017

I want to create files of messages in timely manner, but when I run consumer in some cases it goes to wait condition for some message, because of this my file creation is not working properly. Is anyone knows how can this achieved using Kafka Python.

Thank you
sample.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment