Skip to content
This repository was archived by the owner on Jul 10, 2024. It is now read-only.
This repository was archived by the owner on Jul 10, 2024. It is now read-only.

restart hadoop services occurred an error when I finished the GPU setting for RM、NM and container-executor.cfg #198

@ChanaLii

Description

@ChanaLii

version:
submarine-v0.3.0
hadoop-v3.2.1

I am following the documentation to set up GPU for ResourceManager, NodeManager and container-executor.cfg in my environment.
Then I turned to restart hadoop with the following code:
ARN_LOGFILE=resourcemanager.log ./sbin/yarn-daemon.sh start resourcemanager
YARN_LOGFILE=nodemanager.log ./sbin/yarn-daemon.sh start nodemanager
YARN_LOGFILE=timeline.log ./sbin/yarn-daemon.sh start timelineserver
YARN_LOGFILE=mr-historyserver.log ./sbin/mr-jobhistory-daemon.sh start historyserver

I used the ** jps ** command to see if the service was running. Unfortunately, I found that the nodemanager service was not started. Then I found some errors in hadoop-root-nodemanager-71192c388b55.log

2020-03-01 09:52:38,744 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Failed to bootstrap configured resource subsystems! org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Controller devices not mounted. You either need to mount it with yarn.nodemanager.linux-container-executor.cgroups.mount or mount cgroups before launching Yarn at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializePreMountedCGroupController(CGroupsHandlerImpl.java:392) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:370) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.bootstrap(GpuResourceHandlerImpl.java:93) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler.serviceInit(ContainerScheduler.java:146) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:323) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:516) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054) 2020-03-01 09:52:38,744 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler failed in state INITED java.io.IOException: Failed to bootstrap configured resource subsystems! at org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler.serviceInit(ContainerScheduler.java:150) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:323) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:516) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054) 2020-03-01 09:52:38,745 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state INITED org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to bootstrap configured resource subsystems! at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:323) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:516) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054) Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems! at org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler.serviceInit(ContainerScheduler.java:150) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) ... 8 more

It seems the env didn't mount "/sys/fs/cgroup",here's my docker started command:
➜ Downloads docker run -it -v /data/docker-images/:/sys/fs/cgroup -m 10G 968d612886ee bash
somebody can help me ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions