Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader method to locate message ID by event time #5537

Closed
candlerb opened this issue Nov 2, 2019 · 1 comment
Closed

Reader method to locate message ID by event time #5537

candlerb opened this issue Nov 2, 2019 · 1 comment
Assignees
Labels
type/feature The PR added a new feature or issue requested a new feature

Comments

@candlerb
Copy link
Contributor

candlerb commented Nov 2, 2019

Is your feature request related to a problem? Please describe.
I would like to be able to initialize a reader using an event time. Typical use cases:

  • I have a long-lived, persistent topic containing system logs, and I want to do an ad-hoc query saying "show me logs starting from 1 hour ago"
  • I have a reader which is storing its messageID state in a database, and I want to rewind it to a particular point in time
  • I am developing reader code and I want to test it across a subset of recent messages, not from the very beginning of the topic.

Currently when you create a "reader", you can initialize it to start at either the earliest message or the end of the topic - there are special sentinel message IDs for those cases. You can also initialize it to any existing message ID on a topic, but to do that, you must already know a valid message ID.

Describe the solution you'd like
An API call which, given a topic and event time, returns the message ID of the first message after that time. Alternatively, an additional option to createReader which takes an event time[^1]

This functionality must already exist internally in Pulsar because the admin API has a reset-cursor call for subscriptions. I would like that internal search for event time to be exposed.

I suspect this would end up in the REST API rather than the client protocol. Ideally it would be accessible to clients without admin privileges. I found some work to allow subscription-related admin calls to unprivileged consumers when dealing with their own subscriptions - #2964 / #2981 - so I'd like the new call to be covered by this.

Describe alternatives you've considered
For topic X, I could write a separate index topic X1 which emits a messageID and timestamp, say one message for every 10,000 messages in X. Then I can scan X1 looking for the last timestamp before the time of interest, and then read X forward from that messageID. This would have to be duplicated for every topic.

I could create a temporary subscription, use reset-cursor on it, and read using a consumer - or read one message, and use it to get the message ID to initialize a reader. It seems overblown to create and destroy a subscription just for that.

Additional context
Provides feature parity to Kafka: getOffsetsByTimes / offsetsForTimes


[^1] EDIT: I see that Java Reader API already has seek(timestamp) and seek(messageId) - however Python Consumer has only seek(messageId), and Python Reader is missing seek completely. So I will close this out in favour of #5541

The Java consumer also has subscriptionInitialPosition, but it can only choose between Earliest and Latest.

@candlerb candlerb added the type/feature The PR added a new feature or issue requested a new feature label Nov 2, 2019
@candlerb candlerb closed this as completed Nov 2, 2019
@candlerb
Copy link
Contributor Author

candlerb commented Nov 4, 2019

Fixed by #5542

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

No branches or pull requests

1 participant