Skip to content

Conversation

@Samrat002
Copy link
Contributor

Description of PR

When hadoop cluster running on cloud , uses spot instance and AM is launched on one of those instances. When these instances are removed then we have observed too many AM Launch Failures due to Token Expired or Container Liveliness Expiry when AM Launch Threads are busy retrying to connect to AM Host (Spot Instances) which are down. Having Separate ThreadPools for both Cleanup and Launch will reduce the AM Launch failures.

Token Expired

2022-07-19 14:56:33,486 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl (IPC Server handler 39 on 8041): Unauthorized request to start container.
This token is expired. current time is 1658242593486 found 1658242289457
Note: System times on machines may be out of sync. Check system time and time zones.

Container Liveliness Expiry

2022-07-19 16:06:48,663 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_xxxxxxxxxxxxx_xxxxxxx_xx_000001 Container Transitioned from ACQUIRED to EXPIRED

2022-07-19 16:10:08,663 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor (Ping Checker): Expired:<container=container_xxxxxxxxxxxxx_xxxxxxx_xx_000001, increase=false> Timed out after 600 secs

Associated ticket :- YARN-11251

How was this patch tested?

This patch is tested in EMR cluster where 1 master node and 1 core nodes , and 2 tasks nodes , task nodes are spot instances , we launched an AM in one of the task node and bring it down , This replicate the following senerio

TODO :- unit test need to be added

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 21m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 45s trunk passed
+1 💚 compile 1m 26s trunk passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 30s trunk passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 checkstyle 1m 5s trunk passed
+1 💚 mvnsite 1m 31s trunk passed
+1 💚 javadoc 1m 19s trunk passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 14s trunk passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 spotbugs 2m 34s trunk passed
+1 💚 shadedclient 29m 24s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 29m 53s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 58s the patch passed
+1 💚 compile 0m 56s the patch passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 56s the patch passed
+1 💚 compile 0m 55s the patch passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 55s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 31s the patch passed
+1 💚 mvnsite 1m 2s the patch passed
-1 ❌ javadoc 0m 45s /results-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04 with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 generated 1 new + 5635 unchanged - 0 fixed = 5636 total (was 5635)
-1 ❌ javadoc 0m 44s /results-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04 with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 generated 1 new + 5437 unchanged - 0 fixed = 5438 total (was 5437)
+1 💚 spotbugs 2m 13s the patch passed
+1 💚 shadedclient 28m 2s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 116m 29s /patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
258m 5s
Reason Tests
Failed junit tests hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation
hadoop.yarn.server.resourcemanager.TestRMHA
Subsystem Report/Notes
Docker ClientAPI=1.52 ServerAPI=1.52 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8208/1/artifact/out/Dockerfile
GITHUB PR #8208
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1e1f308f4875 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3ee41a6
Default Java Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8208/1/testReport/
Max. process+thread count 966 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8208/1/console
versions git=2.25.1 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants