Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-13603: do not propagate ExecutionException and add maxRetries li… #6774

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

yzhang559
Copy link

@yzhang559 yzhang559 commented Apr 26, 2024

Description of PR

JIRA = HDFS-13603
The ekek cache warm up thread should not fail the whole warmup of other keys if an invalid key is encountered.
We have observed infinite retries to KMS if one of Encryption Key is not available.

Change it to

  • Only throw IOException if cache warmup fail for all keys, continue to warmup other keys.
  • Should retry only if it fails for all keys, and add a config for the retry limit.

How was this patch tested?

Added unit test TestFSDirEncryptionZoneOp for retry behavior

Related unit tests

mvn test -Dtest=TestEncryptionZones,TestEncryptionZonesWithKMS,TestFSDirEncryptionZoneOp

[INFO] Running org.apache.hadoop.hdfs.TestEncryptionZones
[INFO] Tests run: 44, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 137.217 s - in org.apache.hadoop.hdfs.TestEncryptionZones
[INFO] Running org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
[INFO] Tests run: 47, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 187.815 s - in org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
[INFO] Running org.apache.hadoop.hdfs.server.namenode.TestFSDirEncryptionZoneOp
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.331 s - in org.apache.hadoop.hdfs.server.namenode.TestFSDirEncryptionZoneOp

mvn test -Dtest=TestValueQueue
[INFO] Running org.apache.hadoop.crypto.key.TestValueQueue
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.893 s - in org.apache.hadoop.crypto.key.TestValueQueue

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? NA
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0? NA
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files? NA

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 52s Maven dependency ordering for branch
+1 💚 mvninstall 37m 8s trunk passed
+1 💚 compile 19m 7s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 18m 11s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 4m 43s trunk passed
+1 💚 mvnsite 3m 57s trunk passed
+1 💚 javadoc 3m 8s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 3m 20s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 6m 53s trunk passed
+1 💚 shadedclient 41m 33s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for patch
+1 💚 mvninstall 2m 30s the patch passed
+1 💚 compile 18m 28s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 18m 28s the patch passed
+1 💚 compile 18m 30s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 18m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 37s the patch passed
+1 💚 mvnsite 3m 53s the patch passed
+1 💚 javadoc 3m 1s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 3m 16s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 7m 23s the patch passed
+1 💚 shadedclient 41m 41s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 20m 2s hadoop-common in the patch passed.
+1 💚 unit 3m 50s hadoop-kms in the patch passed.
-1 ❌ unit 263m 51s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
-1 ❌ asflicense 1m 8s /results-asflicense.txt The patch generated 1 ASF License warnings.
549m 30s
Reason Tests
Failed junit tests hadoop.hdfs.TestRollingUpgrade
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6774/1/artifact/out/Dockerfile
GITHUB PR #6774
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 50f4cb272f31 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f0e0386
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6774/1/testReport/
Max. process+thread count 3328 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6774/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 01s No case conflicting files found.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+0 🆗 xmllint 0m 01s xmllint was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 3m 16s Maven dependency ordering for branch
+1 💚 mvninstall 91m 46s trunk passed
+1 💚 compile 40m 45s trunk passed
+1 💚 checkstyle 6m 01s trunk passed
-1 ❌ mvnsite 4m 26s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 15m 25s trunk passed
+1 💚 shadedclient 172m 44s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 18s Maven dependency ordering for patch
+1 💚 mvninstall 12m 21s the patch passed
+1 💚 compile 38m 15s the patch passed
+1 💚 javac 38m 15s the patch passed
+1 💚 blanks 0m 00s The patch has no blanks issues.
+1 💚 checkstyle 6m 12s the patch passed
-1 ❌ mvnsite 4m 35s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 15m 13s the patch passed
+1 💚 shadedclient 183m 04s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ asflicense 5m 52s /results-asflicense.txt The patch generated 1 ASF License warnings.
556m 36s
Subsystem Report/Notes
GITHUB PR #6774
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname MINGW64_NT-10.0-17763 26e5efede4c6 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / f0e0386
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/1/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/1/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 02s No case conflicting files found.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+0 🆗 xmllint 0m 01s xmllint was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 6m 23s Maven dependency ordering for branch
+1 💚 mvninstall 130m 49s trunk passed
+1 💚 compile 61m 01s trunk passed
+1 💚 checkstyle 9m 30s trunk passed
-1 ❌ mvnsite 6m 49s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 23m 26s trunk passed
+1 💚 shadedclient 256m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 3m 30s Maven dependency ordering for patch
+1 💚 mvninstall 18m 36s the patch passed
+1 💚 compile 56m 38s the patch passed
+1 💚 javac 56m 38s the patch passed
+1 💚 blanks 0m 01s The patch has no blanks issues.
+1 💚 checkstyle 9m 10s the patch passed
-1 ❌ mvnsite 6m 55s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 22m 55s the patch passed
+1 💚 shadedclient 263m 03s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ asflicense 8m 50s /results-asflicense.txt The patch generated 1 ASF License warnings.
815m 43s
Subsystem Report/Notes
GITHUB PR #6774
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname MINGW64_NT-10.0-17763 5f2bdf72e508 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / f0e0386
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/2/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/2/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 02s No case conflicting files found.
+0 🆗 spotbugs 0m 00s spotbugs executables are not available.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+0 🆗 xmllint 0m 01s xmllint was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 18s Maven dependency ordering for branch
+1 💚 mvninstall 92m 35s trunk passed
+1 💚 compile 41m 14s trunk passed
+1 💚 checkstyle 6m 25s trunk passed
-1 ❌ mvnsite 4m 36s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 15m 58s trunk passed
+1 💚 shadedclient 179m 02s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 31s Maven dependency ordering for patch
+1 💚 mvninstall 12m 28s the patch passed
+1 💚 compile 37m 59s the patch passed
+1 💚 javac 37m 59s the patch passed
+1 💚 blanks 0m 00s The patch has no blanks issues.
+1 💚 checkstyle 6m 14s the patch passed
-1 ❌ mvnsite 4m 50s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 16m 21s the patch passed
+1 💚 shadedclient 188m 11s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 5m 54s The patch does not generate ASF License warnings.
568m 35s
Subsystem Report/Notes
GITHUB PR #6774
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname MINGW64_NT-10.0-17763 144c7ee9888d 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / 10d763a
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/3/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/3/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@simbadzina simbadzina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned will changing the contract of warmUpEncryptedKeys from

exception if any key isn't initialized

to

exception only if all keys can't be initialized

You can still like the number of retries, without changing the above.

@@ -269,12 +269,23 @@ public ValueQueue(final int numValues, final float lowWaterMark, long expiry,
* Initializes the Value Queues for the provided keys by calling the
* fill Method with "numInitValues" values
* @param keyNames Array of key Names
* @throws ExecutionException executionException.
* @throws IOException if no successful initialization for any key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording here is confusing. One way to read this is if any key. fails to initialize, then an except will be thrown. But IIUC an exception will be thrown if all keys fail to initialize.

@@ -537,12 +537,12 @@ static boolean isInAnEZ(final FSDirectory fsd, final INodesInPath iip)
* then launch up a separate thread to warm them up.
*/
static void warmUpEdekCache(final ExecutorService executor,
final FSDirectory fsd, final int delay, final int interval) {
final FSDirectory fsd, final int delay, final int interval, final int maxRetries) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you edit a comment in the function documentation to indicate that the warm up is best effort.

success = true;
break;
} catch (IOException ioe) {
lastSeenIOE = ioe;
if (sinceLastLog >= logCoolDown) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sinceLastLog is no longer really used now. You can just print the failure since the retry count is limited.


verify(kpMock, times(maxRetries)).warmUpEncryptedKeys(any());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test cache in which one or some of the keys are successfully warmed up, while others aren't.

}

if (keyNames.length > 0 && successfulInitializations == 0) {
throw new IOException("Failed to initialize any queue for the provided keys.", lastException);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you've made warm up a best error operation. If so, there should be no need to through an exception here. Just logging a warning should be enough.

}
sinceLastLog += retryInterval;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not get updated since initial settings. Shall we add it back or remove its usages completely as Simba said?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants