-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-18624. Leaked calls in Client.java may cause ObserverNameNode OOM #5367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
💔 -1 overall
This message was automatically generated. |
simbadzina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZanderXu . Could you please a test case that is fixed by your change. TestIPC#checkConnect has a pattern you can use to create failing calls. Then you can assert that client.connect.calls.isEmpty()
@simbadzina sure, I will complete it later. |
| throw e; | ||
| } finally { | ||
| if (!success) { | ||
| connection.calls.remove(call.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sendRpcRequest(call) will try to add a pair of <call,buf> into rpcRequestQueue. Should we remove that on failure as well?
rpcRequestQueue.put(Pair.of(call, buf));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xinglin for your comment.
We don't need to remove the call from rpcRequestQueue, because the rpcRequestThread will poll it and send to the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the change makes sense to me. Thanks,
|
💔 -1 overall
This message was automatically generated. |
|
I encountered a similar problem, will this mr be merged? @ZanderXu |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
I'd like to keep this PR, since it is important. |
Description of PR
Jira: HADOOP-18624
Leaked calls may cause ObserverNameNode OOM.
During Observer Namenode tailing edits from JournalNode, it will cancel slow request with an interruptException if there are a majority of successful responses.
There is a bug in Client.java, it will not clean the interrupted call from the calls. The leaked calls may cause ObserverNameNode OOM.