-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue 8138][pulsar-client] Improve timeout handling in ClientCnx to cover all remaining request types (GetLastMessageId, GetTopics, GetSchema, GetOrCreateSchema) #8149
Merged
jiazhai
merged 2 commits into
apache:master
from
lhotari:lh-pulsar-client-request-timeouts
Sep 29, 2020
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lhotari
force-pushed
the
lh-pulsar-client-request-timeouts
branch
from
September 28, 2020 11:15
14f3a63
to
5dff7f1
Compare
- add timeout handling for requests of - GetLastMessageId - GetTopics - GetSchema - GetOrCreateSchema - extend the existing requestTimeoutQueue solution to handle these other request types besides the command requests that it is currently used for
lhotari
force-pushed
the
lh-pulsar-client-request-timeouts
branch
from
September 28, 2020 11:20
5dff7f1
to
2fb0ca5
Compare
- pendingGetSchemaRequests and pendingGetOrCreateSchemaRequests should be cleaned in the same way as other pending requests
/pulsarbot run-failure-checks |
1 similar comment
/pulsarbot run-failure-checks |
jiazhai
approved these changes
Sep 29, 2020
jiazhai
pushed a commit
that referenced
this pull request
Sep 30, 2020
Fixes #8138 ### Motivation The issue #8138 is about multiple memory leaks happening when using the Pulsar Java Client Reader API. One of the leaks was previously fixed by #8149 . This PR fixes two remaining ConsumerImpl memory leak issues: - leaks of ConsumerImpl instances via active timers (unAckedMessageTracker, acknowledgmentsGroupingTracker, stats) - leaks of ConsumerImpl instances via ClientCnx when the consumer gets closed by the Broker and the consumer picks a new connection to use ### Modifications - modifications to fix the leaks of ConsumerImpl via active timers: - calls to "closeConsumerTasks" were added in the locations where they were missing - modifications to fix the leaks of ConsumerImpl via ClientCnx - a solution was added to track to which ClientCnx the ConsumerImpl has been registered. When the ConsumerImpl gets registered to a new ClientCnx, the same instance is unregistered from the previous ClientCnx it was previously registered to. - when the connection gets switched, the ConsumerImpl must be de-registered from the previous ClientCnx where it got registered previously - the connection gets switched with the current seek implementation that disconnects the consumer after a seek. - the new connection is picked randomly of the existing connections in the connection pool. Therefore reproducing the leak required setting connectionsPerBroker > 1. - attempt to cover also other cases where the ConsumerImpl instance wasn't de-registered properly. - whenever client.cleanupConsumer(this) is called, the de-registration from ClientCnx (where the ConsumerImpl was previously registered) should be made - fix a bug in closeAsync where cleanup was skipped when the channel was closed or an exception occurred in sending the close consumer request ### Verifying this change - this PR doesn't include new tests. There is a manual test in lhotari@aff700c that has been used to reproduce the memory leak and to verify that no ConsumerImpl references are leaked after the fix has been applied.
lbenc135
pushed a commit
to lbenc135/pulsar
that referenced
this pull request
Oct 3, 2020
…cover all remaining request types (GetLastMessageId, GetTopics, GetSchema, GetOrCreateSchema) (apache#8149) * Add timeout handling for all other request types in ClientCnx Master Issue: apache#8138 ### Motivation Since one of the root causes for apache#8138 is that there isn't timeout handling for GetLastMessageId requests, this PR adds that. The PR improves timeout handling in ClientCnx to cover all remaining request types that don't currently have timeout handling. The request types are: - GetLastMessageId - GetTopics - GetSchema - GetOrCreateSchema ### Modifications The existing requestTimeoutQueue solution to handle "ordinary" command requests has been extended to cover all the requests types. ### Verifying this change This change added tests and can be verified as follows: - Added unit tests in a new test class ClientCnxRequestTimeoutQueueTest to test request timeout handling. - `cd pulsar-client; mvn test -Dtest=ClientCnxRequestTimeoutQueueTest`
lbenc135
pushed a commit
to lbenc135/pulsar
that referenced
this pull request
Oct 3, 2020
Fixes apache#8138 ### Motivation The issue apache#8138 is about multiple memory leaks happening when using the Pulsar Java Client Reader API. One of the leaks was previously fixed by apache#8149 . This PR fixes two remaining ConsumerImpl memory leak issues: - leaks of ConsumerImpl instances via active timers (unAckedMessageTracker, acknowledgmentsGroupingTracker, stats) - leaks of ConsumerImpl instances via ClientCnx when the consumer gets closed by the Broker and the consumer picks a new connection to use ### Modifications - modifications to fix the leaks of ConsumerImpl via active timers: - calls to "closeConsumerTasks" were added in the locations where they were missing - modifications to fix the leaks of ConsumerImpl via ClientCnx - a solution was added to track to which ClientCnx the ConsumerImpl has been registered. When the ConsumerImpl gets registered to a new ClientCnx, the same instance is unregistered from the previous ClientCnx it was previously registered to. - when the connection gets switched, the ConsumerImpl must be de-registered from the previous ClientCnx where it got registered previously - the connection gets switched with the current seek implementation that disconnects the consumer after a seek. - the new connection is picked randomly of the existing connections in the connection pool. Therefore reproducing the leak required setting connectionsPerBroker > 1. - attempt to cover also other cases where the ConsumerImpl instance wasn't de-registered properly. - whenever client.cleanupConsumer(this) is called, the de-registration from ClientCnx (where the ConsumerImpl was previously registered) should be made - fix a bug in closeAsync where cleanup was skipped when the channel was closed or an exception occurred in sending the close consumer request ### Verifying this change - this PR doesn't include new tests. There is a manual test in lhotari@aff700c that has been used to reproduce the memory leak and to verify that no ConsumerImpl references are leaked after the fix has been applied.
wolfstudy
pushed a commit
that referenced
this pull request
Oct 30, 2020
…cover all remaining request types (GetLastMessageId, GetTopics, GetSchema, GetOrCreateSchema) (#8149) * Add timeout handling for all other request types in ClientCnx Master Issue: #8138 Since one of the root causes for #8138 is that there isn't timeout handling for GetLastMessageId requests, this PR adds that. The PR improves timeout handling in ClientCnx to cover all remaining request types that don't currently have timeout handling. The request types are: - GetLastMessageId - GetTopics - GetSchema - GetOrCreateSchema The existing requestTimeoutQueue solution to handle "ordinary" command requests has been extended to cover all the requests types. This change added tests and can be verified as follows: - Added unit tests in a new test class ClientCnxRequestTimeoutQueueTest to test request timeout handling. - `cd pulsar-client; mvn test -Dtest=ClientCnxRequestTimeoutQueueTest` (cherry picked from commit bf6a88e)
huangdx0726
pushed a commit
to huangdx0726/pulsar
that referenced
this pull request
Nov 13, 2020
…cover all remaining request types (GetLastMessageId, GetTopics, GetSchema, GetOrCreateSchema) (apache#8149) * Add timeout handling for all other request types in ClientCnx Master Issue: apache#8138 ### Motivation Since one of the root causes for apache#8138 is that there isn't timeout handling for GetLastMessageId requests, this PR adds that. The PR improves timeout handling in ClientCnx to cover all remaining request types that don't currently have timeout handling. The request types are: - GetLastMessageId - GetTopics - GetSchema - GetOrCreateSchema ### Modifications The existing requestTimeoutQueue solution to handle "ordinary" command requests has been extended to cover all the requests types. ### Verifying this change This change added tests and can be verified as follows: - Added unit tests in a new test class ClientCnxRequestTimeoutQueueTest to test request timeout handling. - `cd pulsar-client; mvn test -Dtest=ClientCnxRequestTimeoutQueueTest`
huangdx0726
pushed a commit
to huangdx0726/pulsar
that referenced
this pull request
Nov 13, 2020
Fixes apache#8138 ### Motivation The issue apache#8138 is about multiple memory leaks happening when using the Pulsar Java Client Reader API. One of the leaks was previously fixed by apache#8149 . This PR fixes two remaining ConsumerImpl memory leak issues: - leaks of ConsumerImpl instances via active timers (unAckedMessageTracker, acknowledgmentsGroupingTracker, stats) - leaks of ConsumerImpl instances via ClientCnx when the consumer gets closed by the Broker and the consumer picks a new connection to use ### Modifications - modifications to fix the leaks of ConsumerImpl via active timers: - calls to "closeConsumerTasks" were added in the locations where they were missing - modifications to fix the leaks of ConsumerImpl via ClientCnx - a solution was added to track to which ClientCnx the ConsumerImpl has been registered. When the ConsumerImpl gets registered to a new ClientCnx, the same instance is unregistered from the previous ClientCnx it was previously registered to. - when the connection gets switched, the ConsumerImpl must be de-registered from the previous ClientCnx where it got registered previously - the connection gets switched with the current seek implementation that disconnects the consumer after a seek. - the new connection is picked randomly of the existing connections in the connection pool. Therefore reproducing the leak required setting connectionsPerBroker > 1. - attempt to cover also other cases where the ConsumerImpl instance wasn't de-registered properly. - whenever client.cleanupConsumer(this) is called, the de-registration from ClientCnx (where the ConsumerImpl was previously registered) should be made - fix a bug in closeAsync where cleanup was skipped when the channel was closed or an exception occurred in sending the close consumer request ### Verifying this change - this PR doesn't include new tests. There is a manual test in lhotari@aff700c that has been used to reproduce the memory leak and to verify that no ConsumerImpl references are leaked after the fix has been applied.
merlimat
pushed a commit
to merlimat/pulsar
that referenced
this pull request
Dec 19, 2020
…cover all remaining request types (GetLastMessageId, GetTopics, GetSchema, GetOrCreateSchema) (apache#8149) * Add timeout handling for all other request types in ClientCnx Master Issue: apache#8138 ### Motivation Since one of the root causes for apache#8138 is that there isn't timeout handling for GetLastMessageId requests, this PR adds that. The PR improves timeout handling in ClientCnx to cover all remaining request types that don't currently have timeout handling. The request types are: - GetLastMessageId - GetTopics - GetSchema - GetOrCreateSchema ### Modifications The existing requestTimeoutQueue solution to handle "ordinary" command requests has been extended to cover all the requests types. ### Verifying this change This change added tests and can be verified as follows: - Added unit tests in a new test class ClientCnxRequestTimeoutQueueTest to test request timeout handling. - `cd pulsar-client; mvn test -Dtest=ClientCnxRequestTimeoutQueueTest`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Master Issue: #8138
Motivation
Since one of the root causes for #8138 is that there isn't timeout handling for GetLastMessageId requests, this PR adds that.
The PR improves timeout handling in ClientCnx to cover all remaining request types that don't currently have timeout handling. The request types are:
- GetLastMessageId
- GetTopics
- GetSchema
- GetOrCreateSchema
Modifications
The existing requestTimeoutQueue solution to handle "ordinary" command requests has been extended to cover all the requests types.
Verifying this change
This change added tests and can be verified as follows:
cd pulsar-client; mvn test -Dtest=ClientCnxRequestTimeoutQueueTest