Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16942. Send error to datanode if FBR is rejected due to bad lease #5460

Merged
merged 5 commits into from
Mar 11, 2023

Conversation

sodonnel
Copy link
Contributor

@sodonnel sodonnel commented Mar 7, 2023

Description of PR

When a datanode sends a FBR to the namenode, it requires a lease to send it. On a couple of busy clusters, we have seen an issue where the DN is somehow delayed in sending the FBR after requesting the least. Then the NN rejects the FBR and logs a message to that effect, but from the Datanodes point of view, it thinks the report was successful and does not try to send another report until the 6 hour default interval has passed.

If this happens to a few DNs, there can be missing and under replicated blocks, further adding to the cluster load. Even worse, I have see the DNs join the cluster with zero blocks, so it is not obvious the under replication is caused by lost a FBR, as all DNs appear to be up and running.

I believe we should propagate an error back to the DN if the FBR is rejected, that way, the DN can request a new lease and try again.

How was this patch tested?

Modified a test and added a new test. Also validated manually by changing the code to always throw the InvalidLease exception to ensure the DN handled it, reset the least ID and retried.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 57s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 48s trunk passed
+1 💚 compile 1m 30s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 1m 21s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 1m 5s trunk passed
+1 💚 mvnsite 1m 33s trunk passed
+1 💚 javadoc 1m 6s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 30s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 3m 41s trunk passed
+1 💚 shadedclient 22m 57s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 18s the patch passed
+1 💚 compile 1m 20s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 1m 20s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 1m 15s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 51s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 79 unchanged - 0 fixed = 81 total (was 79)
+1 💚 mvnsite 1m 25s the patch passed
+1 💚 javadoc 0m 53s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 31s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 3m 26s the patch passed
+1 💚 shadedclient 23m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 243m 39s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 51s The patch does not generate ASF License warnings.
354m 37s
Reason Tests
Failed junit tests hadoop.hdfs.TestRollingUpgrade
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/1/artifact/out/Dockerfile
GITHUB PR #5460
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux e8a6e148e1c3 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b201655
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/1/testReport/
Max. process+thread count 3042 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@sodonnel
Copy link
Contributor Author

sodonnel commented Mar 7, 2023

Not sure what to do about this checkstyle error:

./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InvalidBlockReportLeaseException.java:1:/**: Missing package-info.java file. [JavadocPackage]

If I add the file:

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/package-info.java

I get an enforcer error:

[INFO] -------< org.apache.hadoop:hadoop-client-check-test-invariants >--------
[INFO] Building Apache Hadoop Client Packaging Invariants for Test 3.4.0-SNAPSHOT [106/113]
[INFO] --------------------------------[ pom ]---------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @ hadoop-client-check-test-invariants ---
[INFO] Deleting /Users/sodonnell/source/upstream_hadoop/hadoop-client-modules/hadoop-client-check-test-invariants/target
[INFO] Deleting /Users/sodonnell/source/upstream_hadoop/hadoop-client-modules/hadoop-client-check-test-invariants (includes = [dependency-reduced-pom.xml], excludes = [])
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-client-check-test-invariants ---
[INFO] Executing tasks

main:
    [mkdir] Created dir: /Users/sodonnell/source/upstream_hadoop/hadoop-client-modules/hadoop-client-check-test-invariants/target/test-dir
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-enforcer-plugin:3.0.0:enforce (enforce-banned-dependencies) @ hadoop-client-check-test-invariants ---
[INFO] Adding ignore: module-info
[INFO] Adding ignore: META-INF/versions/*/module-info
[INFO] Adding ignorable dependency: org.apache.hadoop:hadoop-annotations:null
[INFO]   Adding ignore: *
[WARNING] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message:
Duplicate classes found:

  Found in:
    org.apache.hadoop:hadoop-client-minicluster:jar:3.4.0-SNAPSHOT:compile
    org.apache.hadoop:hadoop-client-api:jar:3.4.0-SNAPSHOT:compile
  Duplicate classes:
    org/apache/hadoop/hdfs/server/protocol/package-info.class

@@ -791,6 +792,9 @@ private void offerService() throws Exception {
shouldServiceRun = false;
return;
}
if (InvalidBlockReportLeaseException.class.getName().equals(reClass)) {
fullBlockReportLeaseId = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodonnel Thanks for your works here. Do we also need set forceFullBlockReport to true here? Otherwise, datanode will send block report at next 6 hour by default, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 717, we can see where it attempts to get a lease from the heartbeat if the lease in the DN == 0:

   boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) &&
                  scheduler.isBlockReportDue(startTime);

So its the isBlockReportDue that controls this. Then later, if we have a non zero lease, it will try to create the block report:

        boolean forceFullBr =
            scheduler.forceFullBlockReport.getAndSet(false);
        if (forceFullBr) {
          LOG.info("Forcing a full block report to " + nnAddr);
        }
        if ((fullBlockReportLeaseId != 0) || forceFullBr) {
          cmds = blockReport(fullBlockReportLeaseId);
          fullBlockReportLeaseId = 0;
        }

Its really the isBlockReportDue() method that controls whether a new one should be sent of not, and that is based on time since the last one. The the blockReport(), it updates the time after a successful block report, but if it gets an exception, like this change causes, it will not update the time and so it will try again on the next heartbeat if it gets a new lease.

I think forceFullBlockReport is only for tests, or the command to force a DN block from the CLI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you have any idea about fixing the checkstyle issue? As I mentioned above, trying to add a package-info.java file broke my compile locally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we don't expect to reach here frequently (hopefully datanode is able to acquire lease successfully most of the times), perhaps we can add one liner log to indicate that the particular FBR went through this trouble (i.e. log report id and lease id)? (just in case if it helps debug further)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe reportId and leaseId can be added as constructor argument to InvalidBlockReportLeaseException. This way RemoteException in offerService log will likely print them anyways?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its really the isBlockReportDue() method that controls whether a new one should be sent of not, and that is based on time since the last one. The the blockReport(), it updates the time after a successful block report, but if it gets an exception, like this change causes, it will not update the time and so it will try again on the next heartbeat if it gets a new lease.

Thanks for the detailed explain. Make sense to me.

perhaps we can add one liner log to indicate that the particular FBR went through this trouble (i.e. log report id and lease id)

+1 from my side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the report and lease ID into the exception message, so it will get logged as part of the exception.

Still stuck on that checkstyle violation :-(

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 58s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 41s trunk passed
+1 💚 compile 1m 26s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 1m 24s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 1m 7s trunk passed
+1 💚 mvnsite 1m 28s trunk passed
+1 💚 javadoc 1m 10s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 34s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 3m 25s trunk passed
+1 💚 shadedclient 22m 34s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 15s the patch passed
+1 💚 compile 1m 19s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 1m 19s the patch passed
+1 💚 compile 1m 11s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 1m 11s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 51s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 79 unchanged - 0 fixed = 80 total (was 79)
+1 💚 mvnsite 1m 22s the patch passed
+1 💚 javadoc 0m 50s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 22s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 3m 17s the patch passed
+1 💚 shadedclient 22m 25s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 238m 22s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 51s The patch does not generate ASF License warnings.
345m 2s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/2/artifact/out/Dockerfile
GITHUB PR #5460
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1c427e3c3acb 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 15851db
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/2/testReport/
Max. process+thread count 3164 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani
Copy link
Contributor

Duplicate classes found:

  Found in:
    org.apache.hadoop:hadoop-client-minicluster:jar:3.4.0-SNAPSHOT:compile
    org.apache.hadoop:hadoop-client-api:jar:3.4.0-SNAPSHOT:compile
  Duplicate classes:
    org/apache/hadoop/hdfs/server/protocol/package-info.class

I believe this could be avoided by excluding it from either of the poms:

--- a/hadoop-client-modules/hadoop-client-api/pom.xml
+++ b/hadoop-client-modules/hadoop-client-api/pom.xml
@@ -126,6 +126,12 @@
                         <exclude>org/apache/hadoop/yarn/client/api/package-info.class</exclude>
                       </excludes>
                     </filter>
+                    <filter>
+                      <artifact>org.apache.hadoop:hadoop-hdfs</artifact>
+                      <excludes>
+                        <exclude>org/apache/hadoop/hdfs/server/protocol/package-info.class</exclude>
+                      </excludes>
+                    </filter>
                   </filters>
                   <relocations>
                     <relocation>

For instance, the above patch might help. I haven't tested this but I guess it might work.

@virajjasani
Copy link
Contributor

virajjasani commented Mar 8, 2023

@sodonnel also I am curious, was it just a specific log (only one case e.g. the lease was expired) or combination of logs from checkLease(DatanodeDescriptor dn, long monotonicNowMs, long id) that you have seen in various issues?

I wonder if lease expiry or invalid lease are worth having some dedicated metrics in NameNodeActivity (maybe not as with this patch, the subsequent attempt by BP actor should anyways have new lease id acquired from the response of heartbeat API before it reattempts sending FBR).

@sodonnel
Copy link
Contributor Author

sodonnel commented Mar 9, 2023

@sodonnel also I am curious, was it just a specific log (only one case e.g. the lease was expired) or combination of logs from checkLease(DatanodeDescriptor dn, long monotonicNowMs, long id) that you have seen in various issues?

I wonder if lease expiry or invalid lease are worth having some dedicated metrics in NameNodeActivity (maybe not as with this patch, the subsequent attempt by BP actor should anyways have new lease id acquired from the response of heartbeat API before it reattempts sending FBR).

In the examples I saw, its was expired leases that caused the problem. However the namenode was under significant pressure when it happened. In one example, it was actually the SBNN which was rejecting the reports. Tailing the edits was taking frequent long locks (over 300 seconds at time) which was beyond the lease expiry.

In another example, it was the ANN after startup. I am not sure, but I think the system perhaps out of safemode with many block reports still outstanding, and then between under replication and IBRs, contention on the NN lock seemed to block the FBRs until the lease expired.

@sodonnel
Copy link
Contributor Author

sodonnel commented Mar 9, 2023

Duplicate classes found:

  Found in:
    org.apache.hadoop:hadoop-client-minicluster:jar:3.4.0-SNAPSHOT:compile
    org.apache.hadoop:hadoop-client-api:jar:3.4.0-SNAPSHOT:compile
  Duplicate classes:
    org/apache/hadoop/hdfs/server/protocol/package-info.class

I believe this could be avoided by excluding it from either of the poms:

--- a/hadoop-client-modules/hadoop-client-api/pom.xml
+++ b/hadoop-client-modules/hadoop-client-api/pom.xml
@@ -126,6 +126,12 @@
                         <exclude>org/apache/hadoop/yarn/client/api/package-info.class</exclude>
                       </excludes>
                     </filter>
+                    <filter>
+                      <artifact>org.apache.hadoop:hadoop-hdfs</artifact>
+                      <excludes>
+                        <exclude>org/apache/hadoop/hdfs/server/protocol/package-info.class</exclude>
+                      </excludes>
+                    </filter>
                   </filters>
                   <relocations>
                     <relocation>

For instance, the above patch might help. I haven't tested this but I guess it might work.

@virajjasani This isn't working. I added what you suggested, but still the same error. I also don't understand why this style warning is appearing, as its not a new package folder, and it has plenty of other classes in it already.

@virajjasani
Copy link
Contributor

I also don't understand why this style warning is appearing, as its not a new package folder, and it has plenty of other classes in it already.

I think it's appearing because it was not resolved earlier or it was ignored. For instance, I see the last time the module was touched was by this PR #4252 and in the last QA result, I see checkstyle issues, but of course since the PR is quite old, we can't see what the error was (it reports 404 for the checkstyle report page).
I guess it's not that important either.

This isn't working. I added what you suggested, but still the same error.

Strange, I think this one should hopefully work:

--- a/hadoop-client-modules/hadoop-client-api/pom.xml
+++ b/hadoop-client-modules/hadoop-client-api/pom.xml
@@ -126,6 +126,12 @@
                         <exclude>org/apache/hadoop/yarn/client/api/package-info.class</exclude>
                       </excludes>
                     </filter>
+                    <filter>
+                      <artifact>org.apache.hadoop:*</artifact>
+                      <excludes>
+                        <exclude>org/apache/hadoop/hdfs/server/protocol/package-info.class</exclude>
+                      </excludes>
+                    </filter>
                   </filters>
                   <relocations>
                     <relocation>

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 1s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 29s trunk passed
+1 💚 compile 1m 25s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 1m 20s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 1m 10s trunk passed
+1 💚 mvnsite 1m 30s trunk passed
+1 💚 javadoc 1m 9s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 35s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 3m 27s trunk passed
+1 💚 shadedclient 22m 52s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 22s the patch passed
+1 💚 compile 1m 18s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 1m 18s the patch passed
+1 💚 compile 1m 12s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 1m 12s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 53s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 79 unchanged - 0 fixed = 81 total (was 79)
+1 💚 mvnsite 1m 19s the patch passed
+1 💚 javadoc 0m 51s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 28s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 3m 16s the patch passed
+1 💚 shadedclient 22m 30s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 241m 30s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 50s The patch does not generate ASF License warnings.
348m 18s
Reason Tests
Failed junit tests hadoop.hdfs.TestRollingUpgrade
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/3/artifact/out/Dockerfile
GITHUB PR #5460
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 8bfa574c9407 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9604ad2
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/3/testReport/
Max. process+thread count 2909 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@sodonnel
Copy link
Contributor Author

Strange, I think this one should hopefully work:

--- a/hadoop-client-modules/hadoop-client-api/pom.xml
+++ b/hadoop-client-modules/hadoop-client-api/pom.xml
@@ -126,6 +126,12 @@
                         <exclude>org/apache/hadoop/yarn/client/api/package-info.class</exclude>
                       </excludes>
                     </filter>
+                    <filter>
+                      <artifact>org.apache.hadoop:*</artifact>
+                      <excludes>
+                        <exclude>org/apache/hadoop/hdfs/server/protocol/package-info.class</exclude>
+                      </excludes>
+                    </filter>
                   </filters>
                   <relocations>
                     <relocation>

@virajjasani Thanks! That has fixed the compile. I just pushed the change, so hopefully we will get a green build this time.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 2s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 25s Maven dependency ordering for branch
+1 💚 mvninstall 28m 32s trunk passed
+1 💚 compile 25m 37s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 21m 52s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 4m 9s trunk passed
+1 💚 mvnsite 2m 13s trunk passed
+1 💚 javadoc 1m 45s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 2m 9s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+0 🆗 spotbugs 0m 30s branch/hadoop-client-modules/hadoop-client-api no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 23m 1s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for patch
+1 💚 mvninstall 4m 35s the patch passed
+1 💚 compile 24m 38s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 24m 38s the patch passed
+1 💚 compile 21m 52s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 21m 52s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 52s the patch passed
+1 💚 mvnsite 2m 8s the patch passed
+1 💚 javadoc 1m 38s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 2m 14s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+0 🆗 spotbugs 0m 29s hadoop-client-modules/hadoop-client-api has no data from spotbugs
+1 💚 shadedclient 22m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 225m 51s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 unit 0m 36s hadoop-client-api in the patch passed.
+1 💚 asflicense 1m 1s The patch does not generate ASF License warnings.
451m 39s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.TestRollingUpgrade
hadoop.hdfs.server.namenode.TestFsck
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/4/artifact/out/Dockerfile
GITHUB PR #5460
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname Linux 0c2731d15925 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0b4fe8e
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/4/testReport/
Max. process+thread count 2266 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs hadoop-client-modules/hadoop-client-api U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani
Copy link
Contributor

The reported failures are just known flakes, not related.

+1 (non-binding) from my side, thank you @sodonnel @Hexiaoqiao

@sodonnel
Copy link
Contributor Author

@Hexiaoqiao Are you happy with the latest revision to give a +1 so I can commit? Thanks.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 from my side. The failed unit tests seems not related with this PR.
Thanks @sodonnel and @virajjasani for your works.

@sodonnel sodonnel merged commit ca6f5af into apache:trunk Mar 11, 2023
asfgit pushed a commit that referenced this pull request Mar 11, 2023
#5460)

(cherry picked from commit ca6f5af)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportLease.java
asfgit pushed a commit that referenced this pull request Mar 11, 2023
#5460)

(cherry picked from commit d7b89d0)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportLease.java
@ayushtkn
Copy link
Member

This wasn't the fix: #5460 (comment), rather it broke the javadoc build.
Shoot a mvn clean site and find an exception like

[ERROR] /hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/package-info.java:22: error: package org.apache.hadoop.hdfs.server.protocol has already been annotated
[ERROR] @InterfaceAudience.Private
[ERROR] ^
[ERROR] java.lang.AssertionError
[ERROR] 	at com.sun.tools.javac.util.Assert.error(Assert.java:126)
[ERROR] 	at com.sun.tools.javac.util.Assert.check(Assert.java:45)
[ERROR] 	at com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:177)
[ERROR] 	at com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
[ERROR] 	at com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
[ERROR] 	at com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
[ERROR] 	at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
[ERROR] 	at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
[ERROR] 	at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
[ERROR] 	at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
[ERROR] 	at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
[ERROR] 	at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
[ERROR] 	at com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)

Now the original problem
That @sodonnel mentioned over here #5460 (comment)
That the enforcer is giving an exception once he adds package-info.java

  Duplicate classes:
    org/apache/hadoop/hdfs/server/protocol/package-info.class

And the fix went in the direction this enforcer has gone crazy, lets filter this file itself, but that poor fellow wasn't doing anything wrong :)

Check the file-1
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/protocol/package-info.java

Now your added File-2
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/package-info.java

One was already there in hdfs-client, now this got added for the same package on hdfs as well. Why same package on client and hdfs jar, I think all of us here know those reasons, so not getting into that...

@sodonnel can you delete this new package-info.java file. And we can fix the build post writing "Checkstyle warnings is irrelevant/unavoidable"

@virajjasani
Copy link
Contributor

Or, we just need to remove IA.Private annotation from package-info.java? It seems we don't put IA.Private on any of package-info.java

@virajjasani
Copy link
Contributor

That way in future also, if anyone makes changes in hadoop-hdfs module's hdfs/server/protocol, then jenkins won't given any redundant warning?

@sodonnel
Copy link
Contributor Author

Do you want just a followup PR to remove the file, or is just removing the IA.private line enough? TBH, I just copied the contents from another package-info file.

@ayushtkn
Copy link
Member

Just remove the file and the filter. no need to have dupes. many package-info must be having those IA.private.

TBH, I just copied the contents from another package-info file.

that's what all of us do :)

@virajjasani
Copy link
Contributor

Ah, looks like we have IA annotations in some and not in others. Looks like removing might be better. Thanks!

I wonder how do we deal with it in future though? When someone makes changes only in hadoop-hdfs module's protocol package, the checkstyle failure will unnecessarily get attention?

@sodonnel
Copy link
Contributor Author

But will the checkstyle not affect all future PRs? I think I have seen that happen before, where a new checkstyle issue comes in, and then future PRs are impacted by it, but I may be wrong.

I will remove the file and see what it does.

@virajjasani
Copy link
Contributor

But will the checkstyle not affect all future PRs?

Correct, that's what I was wondering :)

@ayushtkn
Copy link
Member

ayushtkn commented Mar 14, 2023

But will the checkstyle not affect all future PRs? I think I have seen that happen before, where a new checkstyle issue comes in, and then future PRs are impacted by it, but I may be wrong.

How? You folks mean the new PR will also show this checstyle warning? So, our checkstyle works by computing diff b/w the trunk and the patch and flags when the number changes.

if you check the warning above:

hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 79 unchanged - 0 fixed = 81 total (was 79)

There were already 79, the PR had 2 more, so that is what it flags

@sodonnel Ozone operates in different way to figure out checkstyle, may be you would have seen there, it would be true there most probably

@virajjasani
Copy link
Contributor

virajjasani commented Mar 14, 2023

How? You folks mean the new PR will also show this checstyle warning?

Correct, I am only worried about the amount of time folks will end up wasting in future for such warning (everytime new file is added or perhaps even existing file is modified under the package?):

./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/{new-file-name}.java:1:/**: Missing package-info.java file. [JavadocPackage]

As was the case with this PR:

  1. https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
  2. https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
  3. https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5460/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt

@sodonnel
Copy link
Contributor Author

Posted #5478 to address the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants