Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16704. Datanode return empty response instead of NPE for GetVolumeInfo during restarting #4661

Merged
merged 2 commits into from
Aug 15, 2022

Conversation

ZanderXu
Copy link
Contributor

Description of PR

During datanode starting, I found some NPE in logs:

Caused by: java.lang.NullPointerException: Storage not yet initialized
    at org.apache.hadoop.thirdparty.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:899)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.getVolumeInfo(DataNode.java:3533)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
    at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
    at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175) 

Because the storage of datanode not yet initialized when we trying to get metrics of datanode, and related code as below:

@Override // DataNodeMXBean
public String getVolumeInfo() {
  Preconditions.checkNotNull(data, "Storage not yet initialized");
  return JSON.toString(data.getVolumeInfoMap());
} 

The logic is ok, but I feel that the more reasonable logic should be return an empty response instead of NPE, because InfoServer will be started before initBlockPool.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 3s trunk passed
+1 💚 compile 1m 44s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 1m 37s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 25s trunk passed
+1 💚 mvnsite 1m 45s trunk passed
+1 💚 javadoc 1m 26s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 45s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 38s trunk passed
+1 💚 shadedclient 23m 9s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 compile 1m 27s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 1m 27s the patch passed
+1 💚 compile 1m 21s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 21s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 0s the patch passed
+1 💚 mvnsite 1m 25s the patch passed
+1 💚 javadoc 0m 59s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 32s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 24s the patch passed
+1 💚 shadedclient 22m 51s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 237m 31s hadoop-hdfs in the patch passed.
+1 💚 asflicense 1m 15s The patch does not generate ASF License warnings.
347m 38s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4661/1/artifact/out/Dockerfile
GITHUB PR #4661
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 776d4764cb81 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7ce6d940f9e0112b441a66bc6948e8895db0330e
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4661/1/testReport/
Max. process+thread count 3331 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4661/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this a behaviour change, and guess governed by two policies, One is change of public API behaviour and second change in Metrics values as well. Earlier when the storage wasn't initialised the call was failing now it will be returning an empty string. So, that is a behaviour change and can be confused with a state when data.getVolumeInfoMap() is empty.

Extra Stuff: This log line just below is wrong, incorrect placeholder, can fix in a separate jira if interested. :-)

LOG.debug("Reading diskbalancer Status failed. ex:{}", ex);

@ZanderXu
Copy link
Contributor Author

ZanderXu commented Aug 2, 2022

@ayushtkn Thank you very much for helping me review it.

Yes, you are right, we should return an empty value for this case. How about change it same with getDiskBalancerStatus, such as:

@Override // DataNodeMXBean
  public String getVolumeInfo() {
    if (data == null) {
      LOG.debug("Storage not yet initialized.");
      return ""; // it's different with JSON.toString(new HashMap<String, Object>())
    }
    return JSON.toString(data.getVolumeInfoMap());
  }

Developers or SREs are very sensitive to NPE, so I feel we shouldn't out put it in this expected situation.

Extra Stuff: This log line just below is wrong, incorrect placeholder, can fix in a separate jira if interested. :-)

Copy, sir. I will fix it in a separate Jira.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current change makes sense to me.
If no test failures. Changes LGTM.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 45s trunk passed
+1 💚 compile 1m 39s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 1m 27s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 21s trunk passed
+1 💚 mvnsite 1m 33s trunk passed
+1 💚 javadoc 1m 13s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 40s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 47s trunk passed
+1 💚 shadedclient 23m 53s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 19s the patch passed
+1 💚 compile 1m 22s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 1m 22s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 15s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 0s the patch passed
+1 💚 mvnsite 1m 28s the patch passed
+1 💚 javadoc 0m 58s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 24s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 38s the patch passed
+1 💚 shadedclient 25m 31s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 236m 38s hadoop-hdfs in the patch passed.
+1 💚 asflicense 1m 15s The patch does not generate ASF License warnings.
350m 10s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4661/2/artifact/out/Dockerfile
GITHUB PR #4661
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 23bc297674f0 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 69287015aa98557ef950041a8a4defe7613417dc
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4661/2/testReport/
Max. process+thread count 3058 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4661/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 from my side.

@Hexiaoqiao
Copy link
Contributor

@ZanderXu Thanks for your contribution. IIRC, there is also some other precondition checks before initialized, Would you mind to give another check? such as org.apache.hadoop.hdfs.server.datanode.DataNode#initReplicaRecovery or replicaHandler = datanode.data.createRbw(storageType, storageId, block, allowLazyPersist); (which datanode.data is null before DataNode initialized) at here.

@ZanderXu
Copy link
Contributor Author

ZanderXu commented Aug 3, 2022

@Hexiaoqiao Thanks for your review.

Would you mind to give another check?

Sure, I will check them. BTW, If it's ok, can I fix them in a new PR? I hope this PR is just used to fix NPE in getVolumeInfo().

@ayushtkn
Copy link
Member

ayushtkn commented Aug 7, 2022

Are we good here? or Waiting for some more modifications?

@ZanderXu
Copy link
Contributor Author

ZanderXu commented Aug 8, 2022

Yes, we are here, and I have create a new PR-4699 to fix other cases in DataNode.java.

@ayushtkn @Hexiaoqiao Can you help me review them and give me a suggestion that we fix them in two PR or one?

@ZanderXu
Copy link
Contributor Author

@Hexiaoqiao @ayushtkn Master, ping, please help me merge this PR and review PR-4699? Thanks

@Hexiaoqiao Hexiaoqiao merged commit b1d4af2 into apache:trunk Aug 15, 2022
@Hexiaoqiao
Copy link
Contributor

Committed to trunk. Thanks @ZanderXu for your contribution, Thanks @ayushtkn for your reviews.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 1s trunk passed
+1 💚 compile 1m 43s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 1m 38s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 27s trunk passed
+1 💚 mvnsite 1m 47s trunk passed
+1 💚 javadoc 1m 24s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 50s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 43s trunk passed
+1 💚 shadedclient 23m 11s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 compile 1m 27s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 1m 27s the patch passed
+1 💚 compile 1m 20s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 20s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 2s the patch passed
+1 💚 mvnsite 1m 26s the patch passed
+1 💚 javadoc 0m 59s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 29s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 25s the patch passed
+1 💚 shadedclient 22m 43s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 237m 9s hadoop-hdfs in the patch passed.
+1 💚 asflicense 1m 15s The patch does not generate ASF License warnings.
347m 10s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4661/3/artifact/out/Dockerfile
GITHUB PR #4661
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f2e8f1a1b377 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 2af6c2b
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4661/3/testReport/
Max. process+thread count 3385 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4661/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…meInfo during restarting (apache#4661). Contributed by ZanderXu.

Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants