Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28184 Addendum PR #5521

Open
wants to merge 2 commits into
base: branch-2.5
Choose a base branch
from

Conversation

shahrs87
Copy link
Contributor

No description provided.

@@ -259,10 +259,11 @@ private boolean readNextEntryAndRecordReaderPosition() throws IOException {
Entry readEntry = reader.next();
long readerPos = reader.getPosition();
OptionalLong fileLength;
if (logQueue.getQueueSize(walGroupId) > 1) {
if (logQueue.getQueueSize(walGroupId) > 2) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for less than equal to 2 log files in the queue because it is possible that we are doing this check just when the WAL is rolled and the reader has not read the trailer bytes of the old WAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestBasicWALEntryStreamFSHLog#testCleanClosedWALs is failing in branch-2.5 with the original PR #5505
This test is not failing for master branch. There is lot of code refactor in master/branch-2 compared to branch-2.5.
@Apache9 @sunhelly Can you please review? Thank you !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to greater than 2 can fix the failing tests? A bit strange, could you please exlain more on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates here?

Copy link
Contributor Author

@shahrs87 shahrs87 Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates here?

@Apache9 Sorry couldn't update this thread in a long time. Got distracted somewhere and it fell off my radar.

Changing to greater than 2 can fix the failing tests? A bit strange, could you please exlain more on this?

Actually changing to greater than 2 fixes the failing test but looks like it is not the right fix.
The test is doing the following:

  1. Creating WAL named wal1
  2. Appending some entries to wal1
  3. Calling entryStream.next to read from wal1
  4. Roll the WAL to wal2
  5. Append some entries to wal2
  6. Call entryStream.next to read from wal2
  7. Test that there are NO uncleanlyClosedLogs metric.

The test is failing at #6 above. When it is calling entryStream.next on wal2, the replication code needs to switch the reader to the new WAL file. During rollWriter, we add it to AbstractFSWAL#inflightWALClosures map and close the old WAL file asynchronously here.

In closeWriter method, we append the trailer to the WAL and then close it.
During the closeWriter execution there will be 2 WALs in the logQueue.
Now in WALEntryStream#next method, after this change, we don't read the file length if logQueue size is greater than 1 and hence WALEntryStream is unaware of the trailer bytes and while switching the wal from wal1 to wal2, it gets the following exception:

2024-01-17T10:05:47,247 DEBUG [Listener at localhost/52964] wal.ProtobufLogReader(447): Encountered a malformed edit, seeking back to last good position in file, from 218 to 210
java.io.EOFException: Invalid PB, EOF? Ignoring; originalPosition=210, currentPosition=218
	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:376) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:104) ~[classes/:?]
	at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:92) ~[classes/:?]
	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:259) ~[classes/:?]
	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:181) ~[classes/:?]
	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:102) ~[classes/:?]
	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.peek(WALEntryStream.java:111) ~[classes/:?]
	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.next(WALEntryStream.java:118) ~[classes/:?]
	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStreamTestBase$WALEntryStreamWithRetries.access$001(WALEntryStreamTestBase.java:82) ~[test-classes/:?]
	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStreamTestBase$WALEntryStreamWithRetries.lambda$next$0(WALEntryStreamTestBase.java:95) ~[test-classes/:?]
	at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:184) ~[test-classes/:?]
	at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:135) ~[test-classes/:?]
	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStreamTestBase$WALEntryStreamWithRetries.next(WALEntryStreamTestBase.java:94) ~[test-classes/:?]
	at org.apache.hadoop.hbase.replication.regionserver.TestBasicWALEntryStream.testCleanClosedWALs(TestBasicWALEntryStream.java:726) ~[test-classes/:?]

I think I know how to fix.
From PR-5505, we have the below check

    OptionalLong fileLength;
    if (logQueue.getQueueSize(walGroupId) > 1) {
      fileLength = OptionalLong.empty();
    } else {
      // if there is only one file in queue, check whether it is still being written to
      fileLength = walFileLengthProvider.getLogFileSizeIfBeingWritten(currentPath);
    }

Along with checking queue size, we also have to check if the currently replicated WAL is not in AbstractFSWAL#inflightWALClosures map then it is safe to not read the file size.

But currently there is NO way to access AbstractFSWAL object from WALEntryStream. I found one class named WALFileLengthProvider but that is a functional interface and as the name suggests it only provides log file size if that file is currently being written.

@Apache9 Any ideas on how can we expose AbstractFSWAL object to WALEntryStream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9 Can you please take a look in my prev comment?

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2.5 Compile Tests _
+1 💚 mvninstall 2m 50s branch-2.5 passed
+1 💚 compile 2m 20s branch-2.5 passed
+1 💚 checkstyle 0m 37s branch-2.5 passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 29s branch-2.5 passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 37s the patch passed
+1 💚 compile 2m 22s the patch passed
+1 💚 javac 2m 22s the patch passed
+1 💚 checkstyle 0m 37s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 13m 22s Patch does not cause any errors with Hadoop 2.10.2 or 3.2.4 3.3.6.
+1 💚 spotless 0m 40s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 40s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
31m 26s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5521
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 8e29c5cfa8ed 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.5 / 16703c9
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 78 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/1/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 39s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.5 Compile Tests _
+1 💚 mvninstall 3m 5s branch-2.5 passed
+1 💚 compile 0m 42s branch-2.5 passed
+1 💚 shadedjars 5m 13s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 23s branch-2.5 passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 31s the patch passed
+1 💚 compile 0m 41s the patch passed
+1 💚 javac 0m 41s the patch passed
+1 💚 shadedjars 5m 6s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 21s the patch passed
_ Other Tests _
+1 💚 unit 192m 10s hbase-server in the patch passed.
214m 42s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5521
Optional Tests javac javadoc unit shadedjars compile
uname Linux 1bd242a6505a 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.5 / 16703c9
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/1/testReport/
Max. process+thread count 4816 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 43s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.5 Compile Tests _
+1 💚 mvninstall 2m 21s branch-2.5 passed
+1 💚 compile 0m 39s branch-2.5 passed
+1 💚 shadedjars 4m 30s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s branch-2.5 passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 12s the patch passed
+1 💚 compile 0m 41s the patch passed
+1 💚 javac 0m 41s the patch passed
+1 💚 shadedjars 4m 30s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 24s the patch passed
_ Other Tests _
+1 💚 unit 201m 29s hbase-server in the patch passed.
222m 6s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #5521
Optional Tests javac javadoc unit shadedjars compile
uname Linux bac73d2e8656 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.5 / 16703c9
Default Java Temurin-1.8.0_352-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/1/testReport/
Max. process+thread count 3969 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2.5 Compile Tests _
+1 💚 mvninstall 2m 54s branch-2.5 passed
+1 💚 compile 2m 24s branch-2.5 passed
+1 💚 checkstyle 0m 37s branch-2.5 passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 28s branch-2.5 passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 39s the patch passed
+1 💚 compile 2m 20s the patch passed
+1 💚 javac 2m 20s the patch passed
+1 💚 checkstyle 0m 35s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 13m 25s Patch does not cause any errors with Hadoop 2.10.2 or 3.2.4 3.3.6.
+1 💚 spotless 0m 40s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 34s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
31m 37s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5521
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 7ae1ce686c8f 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.5 / 16703c9
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 78 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/2/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 39s Docker mode activated.
-0 ⚠️ yetus 0m 5s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.5 Compile Tests _
+1 💚 mvninstall 2m 33s branch-2.5 passed
+1 💚 compile 0m 42s branch-2.5 passed
+1 💚 shadedjars 5m 9s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 22s branch-2.5 passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 32s the patch passed
+1 💚 compile 0m 42s the patch passed
+1 💚 javac 0m 42s the patch passed
+1 💚 shadedjars 5m 9s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 22s the patch passed
_ Other Tests _
+1 💚 unit 192m 19s hbase-server in the patch passed.
214m 26s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5521
Optional Tests javac javadoc unit shadedjars compile
uname Linux aa4e071303cc 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.5 / 16703c9
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/2/testReport/
Max. process+thread count 4689 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 41s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.5 Compile Tests _
+1 💚 mvninstall 2m 14s branch-2.5 passed
+1 💚 compile 0m 41s branch-2.5 passed
+1 💚 shadedjars 4m 28s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 23s branch-2.5 passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 14s the patch passed
+1 💚 compile 0m 39s the patch passed
+1 💚 javac 0m 39s the patch passed
+1 💚 shadedjars 4m 29s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 24s the patch passed
_ Other Tests _
+1 💚 unit 200m 51s hbase-server in the patch passed.
221m 29s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/2/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #5521
Optional Tests javac javadoc unit shadedjars compile
uname Linux fed2de80d9b9 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.5 / 16703c9
Default Java Temurin-1.8.0_352-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/2/testReport/
Max. process+thread count 4051 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5521/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants