Skip to content

Conversation

@ZanderXu
Copy link
Contributor

@ZanderXu ZanderXu commented Feb 8, 2023

Description of PR

Jira: HADOOP-18624

Leaked calls may cause ObserverNameNode OOM.

During Observer Namenode tailing edits from JournalNode, it will cancel slow request with an interruptException if there are a majority of successful responses.

There is a bug in Client.java, it will not clean the interrupted call from the calls. The leaked calls may cause ObserverNameNode OOM.

public void sendRpcRequest(final Call call)
        throws InterruptedException, IOException {
      if (shouldCloseConnection.get()) {
        return;
      }
      ...
      while (!shouldCloseConnection.get()) {
        // the bug is here, if the rpcRequestQueue is full and there is a interrupt exception, it will cause that the call is only stored into the calls, and not inserted into rpcRequestQueue, so that there is no chance to remove it from the calls until the connection is closed.
        if (rpcRequestQueue.offer(Pair.of(call, buf), 1, TimeUnit.SECONDS)) {
          break;
        }
      }
    }

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 2s codespell was not available.
+0 🆗 detsecrets 0m 2s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 33s trunk passed
+1 💚 compile 25m 24s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 21m 37s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 1m 6s trunk passed
+1 💚 mvnsite 1m 39s trunk passed
+1 💚 javadoc 1m 8s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 40s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 2m 37s trunk passed
+1 💚 shadedclient 28m 11s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 6s the patch passed
+1 💚 compile 24m 30s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 24m 30s the patch passed
+1 💚 compile 21m 34s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 21m 34s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 0s the patch passed
+1 💚 mvnsite 1m 35s the patch passed
+1 💚 javadoc 0m 58s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 41s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 2m 47s the patch passed
+1 💚 shadedclient 28m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 9s hadoop-common in the patch passed.
+1 💚 asflicense 0m 53s The patch does not generate ASF License warnings.
231m 9s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5367/1/artifact/out/Dockerfile
GITHUB PR #5367
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1bc85cfd5cea 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ad4c2ca
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5367/1/testReport/
Max. process+thread count 1801 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5367/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@simbadzina simbadzina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZanderXu . Could you please a test case that is fixed by your change. TestIPC#checkConnect has a pattern you can use to create failing calls. Then you can assert that client.connect.calls.isEmpty()

@ZanderXu
Copy link
Contributor Author

@ZanderXu . Could you please a test case that is fixed by your change. TestIPC#checkConnect has a pattern you can use to create failing calls. Then you can assert that client.connect.calls.isEmpty()

@simbadzina sure, I will complete it later.

throw e;
} finally {
if (!success) {
connection.calls.remove(call.id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sendRpcRequest(call) will try to add a pair of <call,buf> into rpcRequestQueue. Should we remove that on failure as well?

rpcRequestQueue.put(Pair.of(call, buf));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xinglin for your comment.

We don't need to remove the call from rpcRequestQueue, because the rpcRequestThread will poll it and send to the connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the change makes sense to me. Thanks,

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 2s trunk passed
+1 💚 compile 17m 23s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 15m 47s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 1m 7s trunk passed
+1 💚 mvnsite 1m 29s trunk passed
+1 💚 javadoc 1m 9s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 42s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 2m 43s trunk passed
+1 💚 shadedclient 25m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 52s the patch passed
+1 💚 compile 16m 43s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 16m 43s the patch passed
+1 💚 compile 15m 44s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 15m 44s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 1s the patch passed
+1 💚 mvnsite 1m 31s the patch passed
+1 💚 javadoc 1m 0s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 42s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 2m 34s the patch passed
+1 💚 shadedclient 25m 7s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 12s hadoop-common in the patch passed.
+1 💚 asflicense 0m 54s The patch does not generate ASF License warnings.
187m 52s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5367/1/artifact/out/Dockerfile
GITHUB PR #5367
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ea151b9d8da6 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ad4c2ca
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5367/1/testReport/
Max. process+thread count 1253 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5367/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@nstang01
Copy link

nstang01 commented Aug 13, 2024

I encountered a similar problem, will this mr be merged? @ZanderXu

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Oct 27, 2025
@tasanuma
Copy link
Member

I'd like to keep this PR, since it is important.

@tasanuma tasanuma removed the Stale label Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants