You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I would like to be able to initialize a reader using an event time. Typical use cases:
I have a long-lived, persistent topic containing system logs, and I want to do an ad-hoc query saying "show me logs starting from 1 hour ago"
I have a reader which is storing its messageID state in a database, and I want to rewind it to a particular point in time
I am developing reader code and I want to test it across a subset of recent messages, not from the very beginning of the topic.
Currently when you create a "reader", you can initialize it to start at either the earliest message or the end of the topic - there are special sentinel message IDs for those cases. You can also initialize it to any existing message ID on a topic, but to do that, you must already know a valid message ID.
Describe the solution you'd like
An API call which, given a topic and event time, returns the message ID of the first message after that time. Alternatively, an additional option to createReader which takes an event time[^1]
This functionality must already exist internally in Pulsar because the admin API has a reset-cursor call for subscriptions. I would like that internal search for event time to be exposed.
I suspect this would end up in the REST API rather than the client protocol. Ideally it would be accessible to clients without admin privileges. I found some work to allow subscription-related admin calls to unprivileged consumers when dealing with their own subscriptions - #2964 / #2981 - so I'd like the new call to be covered by this.
Describe alternatives you've considered
For topic X, I could write a separate index topic X1 which emits a messageID and timestamp, say one message for every 10,000 messages in X. Then I can scan X1 looking for the last timestamp before the time of interest, and then read X forward from that messageID. This would have to be duplicated for every topic.
I could create a temporary subscription, use reset-cursor on it, and read using a consumer - or read one message, and use it to get the message ID to initialize a reader. It seems overblown to create and destroy a subscription just for that.
Additional context
Provides feature parity to Kafka: getOffsetsByTimes / offsetsForTimes
[^1] EDIT: I see that Java Reader API already has seek(timestamp) and seek(messageId) - however Python Consumer has only seek(messageId), and Python Reader is missing seek completely. So I will close this out in favour of #5541
The Java consumer also has subscriptionInitialPosition, but it can only choose between Earliest and Latest.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
I would like to be able to initialize a reader using an event time. Typical use cases:
Currently when you create a "reader", you can initialize it to start at either the earliest message or the end of the topic - there are special sentinel message IDs for those cases. You can also initialize it to any existing message ID on a topic, but to do that, you must already know a valid message ID.
Describe the solution you'd like
An API call which, given a topic and event time, returns the message ID of the first message after that time. Alternatively, an additional option to createReader which takes an event time[^1]
This functionality must already exist internally in Pulsar because the admin API has a
reset-cursor
call for subscriptions. I would like that internal search for event time to be exposed.I suspect this would end up in the REST API rather than the client protocol. Ideally it would be accessible to clients without admin privileges. I found some work to allow subscription-related admin calls to unprivileged consumers when dealing with their own subscriptions - #2964 / #2981 - so I'd like the new call to be covered by this.Describe alternatives you've considered
For topic X, I could write a separate index topic X1 which emits a messageID and timestamp, say one message for every 10,000 messages in X. Then I can scan X1 looking for the last timestamp before the time of interest, and then read X forward from that messageID. This would have to be duplicated for every topic.
I could create a temporary subscription, use reset-cursor on it, and read using a consumer - or read one message, and use it to get the message ID to initialize a reader. It seems overblown to create and destroy a subscription just for that.
Additional context
Provides feature parity to Kafka:
getOffsetsByTimes
/offsetsForTimes
[^1] EDIT: I see that Java Reader API already has
seek(timestamp)
andseek(messageId)
- however Python Consumer has onlyseek(messageId)
, and Python Reader is missingseek
completely. So I will close this out in favour of #5541The Java consumer also has
subscriptionInitialPosition
, but it can only choose between Earliest and Latest.The text was updated successfully, but these errors were encountered: