Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve] Correct the anti-intuitive behavior of seek when startMessageInclusive() is configured #24047

Open
2 tasks done
BewareMyPower opened this issue Mar 4, 2025 · 3 comments
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Comments

@BewareMyPower
Copy link
Contributor

BewareMyPower commented Mar 4, 2025

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Given a reader and the following code:

        final var msgIds = new ArrayList<MessageId>();
        final var timestamps = new ArrayList<Long>();
        for (int i = 0; i < 3; i++) {
            final var msg = reader.readNext();
            msgIds.add(msg.getMessageId());
            timestamps.add(msg.getPublishTime());
        }

        // 1. seek to the 2nd message's message id
        reader.seek(msgIds.get(1));
        log.info("Case 1: {}", reader.readNext().getValue());
        // 2. seek to the 2nd message's timestamp
        reader.seek(timestamps.get(1));
        log.info("Case 2: {}", reader.readNext().getValue());

It's intuitive to see values of case 1 and case 2 are the same.

However, if the reader has configured startMessageIdInclusive, the results are different:

2025-03-04T12:07:05,365 - INFO  - [main:SimpleProducerConsumerTest] - Case 1: msg-2
2025-03-04T12:07:05,483 - INFO  - [main:SimpleProducerConsumerTest] - Case 2: msg-1

See the full test code:

    @DataProvider
    public static Object[][] startMessageIdInclusive() {
        return new Object[][] { { true }, { false } };
    }

    @Test(dataProvider = "startMessageIdInclusive")
    public void test(boolean startMessageIdInclusive) throws Exception {
        final var topic = "test";
        @Cleanup final var producer = pulsarClient.newProducer(Schema.STRING).topic(topic).create();
        for (int i = 0; i < 3; i++) {
            producer.send("msg-" + i);
        }
        final var readerBuilder = pulsarClient.newReader(Schema.STRING).topic(topic).startMessageId(MessageId.earliest);
        if (startMessageIdInclusive) {
            readerBuilder.startMessageIdInclusive();
        }
        @Cleanup final var reader = readerBuilder.create();
        final var msgIds = new ArrayList<MessageId>();
        final var timestamps = new ArrayList<Long>();
        for (int i = 0; i < 3; i++) {
            final var msg = reader.readNext();
            msgIds.add(msg.getMessageId());
            timestamps.add(msg.getPublishTime());
        }

        // 1. seek to the 2nd message's message id
        reader.seek(msgIds.get(1));
        log.info("Case 1: {}", reader.readNext().getValue());
        // 2. seek to the 2nd message's timestamp
        reader.seek(timestamps.get(1));
        log.info("Case 2: {}", reader.readNext().getValue());
    }

Solution

The root cause is when #4331 introduced this config, it didn't consider the case when seeking by timestamp.

It's hard to change the existing behavior because many applications might rely on it. We'd better add new client side configs and mark existing startMessageInclusive as deprecated.

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@BewareMyPower BewareMyPower added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Mar 4, 2025
@lhotari
Copy link
Member

lhotari commented Mar 4, 2025

Is this in any way related to #23502? /cc @summeriiii

@summeriiii
Copy link
Contributor

it's related to #23508. seek by timestamp and startMessageIdInclusive have no relation.

@BewareMyPower
Copy link
Contributor Author

@lhotari No. They're not related. The behavior was introduced from the very beginning (#4331) and not changed since that. #23508 just fixes the wrong document.

shibd added a commit to shibd/pulsar-client-go that referenced this issue Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

No branches or pull requests

3 participants