Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgboost on yarn is running beyond virtual memory limits #1246

Closed
vivounicorn opened this issue Jun 2, 2016 · 2 comments
Closed

xgboost on yarn is running beyond virtual memory limits #1246

vivounicorn opened this issue Jun 2, 2016 · 2 comments

Comments

@vivounicorn
Copy link

vivounicorn commented Jun 2, 2016

hi @tqchen
when we using xgboost with dmlc,there is some wrong with it. the training set is very very small,our hadoop version is 2.4.1

Diagnostics:Container [pid=40959,containerID=container_1462859744498_485365_01_000002] is running beyond virtual memory limits. Current usage: 76.1 MB of 2 GB physical memory used; 31.8 GB of 4.2 GB virtual memory used. Killing container.

error detail as follows:

Container: container_1462859744498_485365_01_000003 on datanode16.data.cluster_50800
LogType: stderr
LogLength: 655
Log Contents:
readDirect: FSDataInputStream#read error:
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:844)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:144)
[11:11:48] include/dmlc/logging.h:245: [11:11:48] src/io/hdfs_filesys.cc:44: HDFSStream.hdfsRead Error:Unknown error 255
terminate called after throwing an instance of 'dmlc::Error'
what(): [11:11:48] src/io/hdfs_filesys.cc:44: HDFSStream.hdfsRead Error:Unknown error 255

LogType: stdout
LogLength: 38
Log Contents:
[11:11:46] start datanode16.data.cluster:0

Container: container_1462859744498_485365_01_000001 on datanode46.data.cluster_48791
LogType: stderr
LogLength: 17621
Log Contents:
16/05/30 11:11:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/30 11:11:37 INFO dmlc.ApplicationMaster: Start AM as user=yarn
16/05/30 11:11:38 INFO dmlc.ApplicationMaster: Try to start 0 Servers and 1 Workers
16/05/30 11:11:38 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.data.cluster/192.168.201.53:8030
16/05/30 11:11:38 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
16/05/30 11:11:38 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
16/05/30 11:11:38 INFO dmlc.ApplicationMaster: [DMLC] ApplicationMaster started
16/05/30 11:11:40 INFO impl.AMRMClientImpl: Received new token for : datanode61.data.cluster:54536
16/05/30 11:11:40 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/launcher.py" } size: 2696 timestamp: 1464577893786 type: FILE visibility: APPLICATION, xgboost=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/xgboost" } size: 1947053 timestamp: 1464577893921 type: FILE visibility: APPLICATION, libstdc++.so.6=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/libstdc++.so.6" } size: 6469571 timestamp: 1464577894125 type: FILE visibility: APPLICATION, mushroom.aws.conf=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/mushroom.aws.conf" } size: 855 timestamp: 1464577894243 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/dmlc-yarn.jar" } size: 21292 timestamp: 1464577894380 type: FILE visibility: APPLICATION}
16/05/30 11:11:40 INFO dmlc.ApplicationMaster: {DMLC_NODE_HOST=datanode61.data.cluster, DMLC_ROLE=worker, DMLC_TASK_ID=0, CLASSPATH=${CLASSPATH}:./:/data/sysdir/hadoop-2.4.1/etc/hadoop:
/data/sysdir/hadoop-2.4.1/share/hadoop/common//:
/data/sysdir/hadoop-2.4.1/share/hadoop/common/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/yarn//:/data/sysdir/hadoop-2.4.1/share/hadoop/yarn/lib/
, DMLC_NUM_WORKER=1, PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_ATTEMPT=0, DMLC_NUM_SERVER=0, DMLC_SERVER_MEMORY_MB=1024, DMLC_JOB_ARCHIVES=, DMLC_TRACKER_URI=192.168.201.152, DMLC_JOB_CLUSTER=yarn, DMLC_WORKER_MEMORY_MB=1024, DMLC_WORKER_CORES=1, LD_LIBRARY_PATH=/data/home/tangshouxu/gbdt/xgboost/lib::$HADOOP_HDFS_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server, DMLC_TRACKER_PORT=9092, DMLC_SERVER_CORES=1}
16/05/30 11:11:40 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000002
16/05/30 11:11:40 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode61.data.cluster:54536
16/05/30 11:11:40 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:43 WARN dmlc.ApplicationMaster: KILLED_EXCEEDED_PMEM
16/05/30 11:11:43 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 143 Diagnostics:Container [pid=40959,containerID=container_1462859744498_485365_01_000002] is running beyond virtual memory limits. Current usage: 76.1 MB of 2 GB physical memory used; 31.8 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462859744498_485365_01_000002 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 40959 33704 40959 40959 (bash) 0 0 108699648 268 /bin/bash -c ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model 1>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000002/stdout 2>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000002/stderr
|- 40981 40967 40959 40959 (xgboost) 158 7 33956786176 18112 ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
|- 40967 40959 40959 40959 (python) 2 0 131223552 1107 python ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

16/05/30 11:11:43 INFO dmlc.ApplicationMaster: Task 0 failed on container_1462859744498_485365_01_000002. See LOG at : http://datanode61.data.cluster:8042/node/containerlogs/container_1462859744498_485365_01_000002/yarn
16/05/30 11:11:43 INFO impl.NMClientAsyncImpl: Processing Event EventType: STOP_CONTAINER for Container container_1462859744498_485365_01_000002
16/05/30 11:11:43 INFO dmlc.ApplicationMaster: onContainerStopped Invoked
16/05/30 11:11:45 INFO impl.AMRMClientImpl: Received new token for : datanode16.data.cluster:50800
16/05/30 11:11:45 INFO impl.AMRMClientImpl: Received new token for : datanode68.data.cluster:19652
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/launcher.py" } size: 2696 timestamp: 1464577893786 type: FILE visibility: APPLICATION, xgboost=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/xgboost" } size: 1947053 timestamp: 1464577893921 type: FILE visibility: APPLICATION, libstdc++.so.6=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/libstdc++.so.6" } size: 6469571 timestamp: 1464577894125 type: FILE visibility: APPLICATION, mushroom.aws.conf=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/mushroom.aws.conf" } size: 855 timestamp: 1464577894243 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/dmlc-yarn.jar" } size: 21292 timestamp: 1464577894380 type: FILE visibility: APPLICATION}
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: {DMLC_NODE_HOST=datanode16.data.cluster, DMLC_ROLE=worker, DMLC_TASK_ID=0, CLASSPATH=${CLASSPATH}:./:/data/sysdir/hadoop-2.4.1/etc/hadoop:
/data/sysdir/hadoop-2.4.1/share/hadoop/common//:
/data/sysdir/hadoop-2.4.1/share/hadoop/common/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/yarn//:/data/sysdir/hadoop-2.4.1/share/hadoop/yarn/lib/
, DMLC_NUM_WORKER=1, PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_ATTEMPT=1, DMLC_NUM_SERVER=0, DMLC_SERVER_MEMORY_MB=1024, DMLC_JOB_ARCHIVES=, DMLC_TRACKER_URI=192.168.201.152, DMLC_JOB_CLUSTER=yarn, DMLC_WORKER_MEMORY_MB=1024, DMLC_WORKER_CORES=1, LD_LIBRARY_PATH=/data/home/tangshouxu/gbdt/xgboost/lib::$HADOOP_HDFS_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server, DMLC_TRACKER_PORT=9092, DMLC_SERVER_CORES=1}
16/05/30 11:11:45 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000003
16/05/30 11:11:45 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode16.data.cluster:50800
16/05/30 11:11:45 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000004
16/05/30 11:11:45 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode68.data.cluster:19652
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:49 WARN dmlc.ApplicationMaster: KILLED_EXCEEDED_PMEM
16/05/30 11:11:49 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 143 Diagnostics:Container [pid=83953,containerID=container_1462859744498_485365_01_000003] is running beyond virtual memory limits. Current usage: 197.6 MB of 2 GB physical memory used; 31.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462859744498_485365_01_000003 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 83953 29156 83953 83953 (bash) 0 0 108650496 314 /bin/bash -c ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model 1>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000003/stdout 2>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000003/stderr
|- 83986 83966 83953 83953 (xgboost) 355 22 33988182016 49165 ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
|- 83966 83953 83953 83953 (python) 1 1 131186688 1107 python ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

16/05/30 11:11:49 INFO dmlc.ApplicationMaster: Task 0 failed on container_1462859744498_485365_01_000003. See LOG at : http://datanode16.data.cluster:8042/node/containerlogs/container_1462859744498_485365_01_000003/yarn
16/05/30 11:11:49 INFO impl.NMClientAsyncImpl: Processing Event EventType: STOP_CONTAINER for Container container_1462859744498_485365_01_000003
16/05/30 11:11:49 INFO dmlc.ApplicationMaster: onContainerStopped Invoked
16/05/30 11:11:51 INFO impl.AMRMClientImpl: Received new token for : datanode123.data.cluster:60096
16/05/30 11:11:51 INFO impl.AMRMClientImpl: Received new token for : datanode136.data.cluster:19129
16/05/30 11:11:51 INFO impl.AMRMClientImpl: Received new token for : datanode96.data.cluster:40278
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/launcher.py" } size: 2696 timestamp: 1464577893786 type: FILE visibility: APPLICATION, xgboost=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/xgboost" } size: 1947053 timestamp: 1464577893921 type: FILE visibility: APPLICATION, libstdc++.so.6=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/libstdc++.so.6" } size: 6469571 timestamp: 1464577894125 type: FILE visibility: APPLICATION, mushroom.aws.conf=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/mushroom.aws.conf" } size: 855 timestamp: 1464577894243 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/dmlc-yarn.jar" } size: 21292 timestamp: 1464577894380 type: FILE visibility: APPLICATION}
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: {DMLC_NODE_HOST=datanode123.data.cluster, DMLC_ROLE=worker, DMLC_TASK_ID=0, CLASSPATH=${CLASSPATH}:./:/data/sysdir/hadoop-2.4.1/etc/hadoop:
/data/sysdir/hadoop-2.4.1/share/hadoop/common//:
/data/sysdir/hadoop-2.4.1/share/hadoop/common/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/yarn//:/data/sysdir/hadoop-2.4.1/share/hadoop/yarn/lib/
, DMLC_NUM_WORKER=1, PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_ATTEMPT=2, DMLC_NUM_SERVER=0, DMLC_SERVER_MEMORY_MB=1024, DMLC_JOB_ARCHIVES=, DMLC_TRACKER_URI=192.168.201.152, DMLC_JOB_CLUSTER=yarn, DMLC_WORKER_MEMORY_MB=1024, DMLC_WORKER_CORES=1, LD_LIBRARY_PATH=/data/home/tangshouxu/gbdt/xgboost/lib::$HADOOP_HDFS_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server, DMLC_TRACKER_PORT=9092, DMLC_SERVER_CORES=1}
16/05/30 11:11:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000005
16/05/30 11:11:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode123.data.cluster:60096
16/05/30 11:11:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000006
16/05/30 11:11:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode136.data.cluster:19129
16/05/30 11:11:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000007
16/05/30 11:11:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode96.data.cluster:40278
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:56 WARN dmlc.ApplicationMaster: KILLED_EXCEEDED_PMEM
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 143 Diagnostics:Container [pid=137020,containerID=container_1462859744498_485365_01_000005] is running beyond virtual memory limits. Current usage: 117.9 MB of 2 GB physical memory used; 31.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462859744498_485365_01_000005 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 137052 137028 137020 137020 (xgboost) 243 14 33985527808 28756 ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
|- 137020 137486 137020 137020 (bash) 0 0 108654592 312 /bin/bash -c ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model 1>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000005/stdout 2>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000005/stderr
|- 137028 137020 137020 137020 (python) 2 0 131186688 1108 python ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

16/05/30 11:11:56 INFO dmlc.ApplicationMaster: Task 0 failed on container_1462859744498_485365_01_000005. See LOG at : http://datanode123.data.cluster:8042/node/containerlogs/container_1462859744498_485365_01_000005/yarn
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: [DMLC] Task 0 failed more than 3times
16/05/30 11:11:56 INFO impl.NMClientAsyncImpl: Processing Event EventType: STOP_CONTAINER for Container container_1462859744498_485365_01_000005
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: onContainerStopped Invoked
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: Application completed. Stopping running containers
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode61.data.cluster:54536
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode16.data.cluster:50800
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode123.data.cluster:60096
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode68.data.cluster:19652
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode96.data.cluster:40278
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode136.data.cluster:19129
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: Diagnostics., num_tasks1, finished=0, failed=1
[DMLC] Task 0 failed more than 3times
16/05/30 11:11:56 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
Exception in thread "main" java.lang.Exception: Application not successful
at org.apache.hadoop.yarn.dmlc.ApplicationMaster.run(ApplicationMaster.java:290)
at org.apache.hadoop.yarn.dmlc.ApplicationMaster.main(ApplicationMaster.java:115)

@vivounicorn
Copy link
Author

@tqchen thx

@vivounicorn
Copy link
Author

no reply

@lock lock bot locked as resolved and limited conversation to collaborators Oct 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant