Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IllegalStateException in PersistentReplicator #10098

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Mar 31, 2021

Fixes: #10097

Motivation

See #10097 for the issue. It seems that the code broke when the switch was made to LightProto in #9046.

Modifications

It is necessary to use msg.getMessageBuilder().hasReplicatedFrom() and use logic that only calls msg.getMessageBuilder().getReplicatedFrom() if hasReplicatedFrom() returns true.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.
the change looks good to me

Do you mind adding some minimal test case that reproduces the problem ?

it is scary that we do not have any broken test.

My reasoning is that we can commit this fix as it is but, as we are not in a hurry (no need for an hotfix, as 2.8.0 is still unreleased), we can spend a little more time and add a test that covers this change.

@lhotari
Copy link
Member Author

lhotari commented Mar 31, 2021

Do you mind adding some minimal test case that reproduces the problem ?

it is scary that we do not have any broken test.

My reasoning is that we can commit this fix as it is but, as we are not in a hurry (no need for an hotfix, as 2.8.0 is still unreleased), we can spend a little more time and add a test that covers this change.

Makes sense. It seems that integration tests are missing for replicated subscriptions (PIP-33) so it requires wider understanding of the feature to be able to add the required test coverage.

@merlimat do you have recommendations of how replicated subscriptions could be tested as part of the automated tests in CI?

@eolivelli
Copy link
Contributor

I suggest to not go in the direction of integration tests.
We should start two PulsarServices and a Global ZK, attach clients to the services and verify that everything works as expected.

Is it possible that we do not have tests like this ?

@lhotari
Copy link
Member Author

lhotari commented Mar 31, 2021

I suggest to not go in the direction of integration tests.
We should start two PulsarServices and a Global ZK, attach clients to the services and verify that everything works as expected.

Is it possible that we do not have tests like this ?

Makes sense. The integration test doesn't have to be under tests/integration where we have some integration tests. Most tests are more or less integration tests at the moment. :)

@lhotari
Copy link
Member Author

lhotari commented Mar 31, 2021

Fixing the IllegalStateException doesn't fix subscription replication. A new exception appears. I reported that in #10097 (comment) .

@lhotari
Copy link
Member Author

lhotari commented Mar 31, 2021

/pulsarbot run-failure-checks

1 similar comment
@lhotari
Copy link
Member Author

lhotari commented Mar 31, 2021

/pulsarbot run-failure-checks

@lhotari lhotari force-pushed the lh-fix-illegalstateexception-in-persistentreplicator branch from 4563683 to 28faa8e Compare April 1, 2021 11:59
@lhotari
Copy link
Member Author

lhotari commented Apr 1, 2021

I added a test that reproduces #10054 and it also reproduced the IllegalStateException . However it didn't reproduce the IllegalReferenceCountException that I noticed when testing manually.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member Author

lhotari commented Apr 1, 2021

/pulsarbot run-failure-checks

@lhotari lhotari requested a review from sijie April 1, 2021 20:36
@lhotari lhotari force-pushed the lh-fix-illegalstateexception-in-persistentreplicator branch from cca419d to 2cb384b Compare April 2, 2021 04:38
@lhotari
Copy link
Member Author

lhotari commented Apr 2, 2021

/pulsarbot run-failure-checks

4 similar comments
@lhotari
Copy link
Member Author

lhotari commented Apr 2, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Apr 2, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Apr 2, 2021

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Apr 2, 2021

/pulsarbot run-failure-checks

@eolivelli
Copy link
Contributor

@sijie can you please take another look ?

@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Apr 5, 2021
@merlimat merlimat added this to the 2.8.0 milestone Apr 5, 2021
Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sijie sijie merged commit 05f1f7e into apache:master Apr 9, 2021
@eolivelli
Copy link
Contributor

LightProto is not present in branch-7.2, removing release/2.7.2 label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replicated Subscription fail with IllegalStateException
5 participants