Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-16953. RBF: Mount table store APIs should update cache only if state store record is successfully updated #5482

Merged
merged 2 commits into from Mar 18, 2023

Conversation

virajjasani
Copy link
Contributor

RBF Mount table state store APIs addMountTableEntry, updateMountTableEntry and removeMountTableEntry performs cache refresh for all routers regardless of the actual record update result. If the record fails to get updated on zookeeper/file based store impl, reloading the cache for all routers would be unnecessary.

For instance, simultaneously adding new mount point could lead to failure for the second call if first call has not added new entry by the time second call retrieves mount table entry from getMountTableEntries before attempting to call addMountTableEntry.

DEBUG [{cluster}/{ip}:8111] ipc.Client - IPC Client (1826699684) connection to nn-0-{ns}.{cluster}/{ip}:8111 from {user}IPC Client (1826699684) connection to nn-0-{ns}.{cluster}/{ip}:8111 from {user} sending #1 org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocol.addMountTableEntry
DEBUG [{cluster}/{ip}:8111 from {user}] ipc.Client - IPC Client (1826699684) connection to nn-0-{ns}.{cluster}/{ip}:8111 from {user} got value #1
DEBUG [main] ipc.ProtobufRpcEngine2 - Call: addMountTableEntry took 24ms
DEBUG [{cluster}/{ip}:8111 from {user}] ipc.Client - IPC Client (1826699684) connection to nn-0-{ns}.{cluster}/{ip}:8111 from {user}: closed
DEBUG [{cluster}/{ip}:8111 from {user}] ipc.Client - IPC Client (1826699684) connection to nn-0-{ns}.{cluster}/{ip}:8111 from {user}: stopped, remaining connections 0
TRACE [main] ipc.ProtobufRpcEngine2 - 1: Response <- nn-0-{ns}.{cluster}/{ip}:8111: addMountTableEntry {status: false}

Cannot add mount point /data503 

The failure to write new record:

INFO  [IPC Server handler 0 on default port 8111] impl.StateStoreZooKeeperImpl - Cannot write record "/hdfs-federation/MountTable/0SLASH0data503", it already exists 

Since the successful call has already refreshed cache for all routers, second call that failed should not have refreshed cache for all routers again as everyone already has updated records in cache.

@virajjasani
Copy link
Contributor Author

virajjasani commented Mar 16, 2023

@goiri @ayushtkn @Hexiaoqiao could you please review this PR?

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the description, got a bit confused around the failed mount entry add fail stuff.
So, Just to clarify,

In layman terms:

  • There were concurrent addMountEntry for same mount path from different Routers.

  • One Router succeeded and the other failed because entry got in already?

  • It failed as expected and normally, no bug here? As far as I understood the answer is Yes.

  • The only problem is since the add entry failed there was no need to update the cache as nothing changed and that what the code does and the description says?

If I got it all right, then changes LGTM

@virajjasani
Copy link
Contributor Author

If I got it all right

Absolutely sir, all points are correct. Sorry for adding more details above. I wish I had a way to write a test to validate this, I tried but could not find better way to write a test. I found that we have some tests like TestRouterAdminCLI#testAddMountTable for which one of the addMountEntry calls fails (log Cannot add mount point /test-addmounttable) but validating that we prevent reloading cache for unsuccessful attempt seems quite trickier, otherwise some small test might have made this look bit clear.

@virajjasani
Copy link
Contributor Author

While exactly verifying this patch's behavior might not be feasible (non-flaky test), let me at least validate that source and destination both are present with one successful and one failed attempt of adding the mount table entry.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 33s trunk passed
+1 💚 compile 0m 43s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 48s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 57s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 34s trunk passed
+1 💚 shadedclient 23m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 17s the patch passed
+1 💚 mvnsite 0m 34s the patch passed
+1 💚 javadoc 0m 32s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 50s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 21s the patch passed
+1 💚 shadedclient 23m 39s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 21m 22s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
126m 5s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5482/1/artifact/out/Dockerfile
GITHUB PR #5482
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 5883201056f8 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b824fa7
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5482/1/testReport/
Max. process+thread count 2640 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5482/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 26s trunk passed
+1 💚 compile 0m 46s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 41s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 checkstyle 0m 34s trunk passed
+1 💚 mvnsite 0m 47s trunk passed
+1 💚 javadoc 0m 53s trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 1m 2s trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 35s trunk passed
+1 💚 shadedclient 21m 1s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 javac 0m 32s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 36s the patch passed
+1 💚 javadoc 0m 32s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 50s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
+1 💚 spotbugs 1m 21s the patch passed
+1 💚 shadedclient 20m 29s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 20m 25s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
116m 53s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5482/2/artifact/out/Dockerfile
GITHUB PR #5482
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 32ba38e4806c 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 84c2276
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5482/2/testReport/
Max. process+thread count 2723 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5482/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@goiri goiri changed the title HDFS-16953. RBF Mount table store APIs should update cache only if state store record is successfully updated HDFS-16953. RBF: Mount table store APIs should update cache only if state store record is successfully updated Mar 18, 2023
@goiri goiri merged commit f8d0949 into apache:trunk Mar 18, 2023
1 check passed
ferdelyi pushed a commit to ferdelyi/hadoop that referenced this pull request May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants