Skip to content

Conversation

@madrob
Copy link

@madrob madrob commented Feb 22, 2022

Description of PR

Provides additional clarity in logs when debugging MiniDFSCluster with
multiple in-process DataNodes.

How was this patch tested?

Visual inspection of log output.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?

Provides additional clarity in logs when debugging MiniDFSCluster with
multiple in-process DataNodes.

CommandProcessingThread(BPServiceActor actor) {
super("Command processor");
setName("Command processor-" + getId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using super()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the thread id is set during construction and not available to us yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madrob for your works. I don't think getId() will get additional information.
IMO, it may be better to change like the following?
super("Command processor for " + nnAddr);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, but I don't think that would be an improvement. Let me explain the motivation in more detail?

The id is just the numeric Java thread id, and it's enough to differentiate the command processors between each other when there are multiple DN running in the same process like in MiniDFSCluster during unit tests.

Putting the NN address in would not disambiguate the logs because they would all be for the same NN still. It would give more information, sure, but not actually helpful information.

With my change, the log messages would have (Command processor-56) or -68 or whatever the thread was. Again, just enough to differentiate them from one another, which is what I needed for tracing their lifecycle and operation.

If there's a DN address we can use in the thread name instead, then that's good too but I don't know enough about Hadoop internals to find that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the NN address in would not disambiguate the logs because they would all be for the same NN still. It would give more information, sure, but not actually helpful information.

Sorry I don't this information. IIUC, now each command processor match to block pool one by one. and actually nnAddr includes hostname/port together. I mean that it could different each other even for MiniDFSCluster framework. Right? for another way, with nnAddr, it could be helpful to dig when this thread meet issues.
Anyway, I don't disagree to add getId() also here. We could add both them to the thread name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue here is in case of MiniDfsCluster all datanodes log at one place.Means if we spin a MiniDfsCluster with 9 datanodes, all Command processor threads will log at same place, and we can't distinguish, which thread belongs to which datanode.
DN1 will also have same name & DN2 till DN9. If we add namenode address, then also the names of the thread will stay same right? All datanodes in a single MiniDfs will be connected to same set of namenodes, right?

May be adding DN address should be a good idea?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, @ayushtkn. All of the DN will have the same NN and then they will log the same thread name string.

For adding DN address, I'm not sure which fields in DN are considered stable for this purpose. Maybe setName("Command processor for " + dn.getDisplayName()); will work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue here is in case of MiniDfsCluster all datanodes log at one place.Means if we spin a MiniDfsCluster with 9 datanodes, all Command processor threads will log at same place, and we can't distinguish, which thread belongs to which datanode. DN1 will also have same name & DN2 till DN9. If we add namenode address, then also the names of the thread will stay same right? All datanodes in a single MiniDfs will be connected to same set of namenodes, right?

May be adding DN address should be a good idea?

Thanks @madrob @ayushtkn for your discussions and information. From my side, I am more concerned how to differ these threads service to which namespace when setup Federation production cluster. This may be helpful for digging issues when meet something not expected. I total agree to add dn.getDisplayName() or getId() or something else also that can improve the MiniDFSCluster log readable. IMO we should consider production env meanwhile. FYI. Of course this is not the FATAL issue, my comment is not blocker. Please feel free to go ahead if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, we should handle federation setup as well. That is what will help in actual prod env.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 4s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 37m 17s trunk passed
+1 💚 compile 1m 38s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 27s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 7s trunk passed
+1 💚 mvnsite 1m 45s trunk passed
+1 💚 javadoc 1m 12s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 43s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 52s trunk passed
+1 💚 shadedclient 30m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 42s the patch passed
+1 💚 compile 1m 47s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 47s the patch passed
+1 💚 compile 1m 35s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 35s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 4s the patch passed
+1 💚 mvnsite 1m 40s the patch passed
+1 💚 javadoc 1m 6s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 33s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 4m 18s the patch passed
+1 💚 shadedclient 32m 17s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 363m 51s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
489m 56s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4016/1/artifact/out/Dockerfile
GITHUB PR #4016
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux c5f1d2455806 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a97f1ae
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4016/1/testReport/
Max. process+thread count 2016 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4016/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 26s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 7s trunk passed
+1 💚 compile 1m 37s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 25s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 3s trunk passed
+1 💚 mvnsite 1m 36s trunk passed
+1 💚 javadoc 1m 8s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 32s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 33s trunk passed
+1 💚 shadedclient 26m 27s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 25s the patch passed
+1 💚 compile 1m 31s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 31s the patch passed
+1 💚 compile 1m 19s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 19s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 54s the patch passed
+1 💚 mvnsite 1m 24s the patch passed
+1 💚 javadoc 0m 54s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 26s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 35s the patch passed
+1 💚 shadedclient 25m 54s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 370m 52s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
484m 46s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4016/2/artifact/out/Dockerfile
GITHUB PR #4016
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux b0f88b475aa7 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 6cbf550
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4016/2/testReport/
Max. process+thread count 2076 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4016/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@tomscut tomscut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to distinguish threads, we might have to add nnaddr+ dn.getDisplayName, right?

@madrob
Copy link
Author

madrob commented Feb 28, 2022

If you want to distinguish threads, we might have to add nnaddr+ dn.getDisplayName, right?

I don't think that is necessary, but it could be a great addition in a future PR if you find yourself having the need for your use cases.

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Nov 16, 2025
@github-actions github-actions bot closed this Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants