You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Container: container_1462859744498_485365_01_000001 on datanode46.data.cluster_48791
LogType: stderr
LogLength: 17621
Log Contents:
16/05/30 11:11:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/30 11:11:37 INFO dmlc.ApplicationMaster: Start AM as user=yarn
16/05/30 11:11:38 INFO dmlc.ApplicationMaster: Try to start 0 Servers and 1 Workers
16/05/30 11:11:38 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.data.cluster/192.168.201.53:8030
16/05/30 11:11:38 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
16/05/30 11:11:38 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
16/05/30 11:11:38 INFO dmlc.ApplicationMaster: [DMLC] ApplicationMaster started
16/05/30 11:11:40 INFO impl.AMRMClientImpl: Received new token for : datanode61.data.cluster:54536
16/05/30 11:11:40 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/launcher.py" } size: 2696 timestamp: 1464577893786 type: FILE visibility: APPLICATION, xgboost=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/xgboost" } size: 1947053 timestamp: 1464577893921 type: FILE visibility: APPLICATION, libstdc++.so.6=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/libstdc++.so.6" } size: 6469571 timestamp: 1464577894125 type: FILE visibility: APPLICATION, mushroom.aws.conf=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/mushroom.aws.conf" } size: 855 timestamp: 1464577894243 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/dmlc-yarn.jar" } size: 21292 timestamp: 1464577894380 type: FILE visibility: APPLICATION}
16/05/30 11:11:40 INFO dmlc.ApplicationMaster: {DMLC_NODE_HOST=datanode61.data.cluster, DMLC_ROLE=worker, DMLC_TASK_ID=0, CLASSPATH=${CLASSPATH}:./:/data/sysdir/hadoop-2.4.1/etc/hadoop:
/data/sysdir/hadoop-2.4.1/share/hadoop/common//:
/data/sysdir/hadoop-2.4.1/share/hadoop/common/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/yarn//:/data/sysdir/hadoop-2.4.1/share/hadoop/yarn/lib/, DMLC_NUM_WORKER=1, PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_ATTEMPT=0, DMLC_NUM_SERVER=0, DMLC_SERVER_MEMORY_MB=1024, DMLC_JOB_ARCHIVES=, DMLC_TRACKER_URI=192.168.201.152, DMLC_JOB_CLUSTER=yarn, DMLC_WORKER_MEMORY_MB=1024, DMLC_WORKER_CORES=1, LD_LIBRARY_PATH=/data/home/tangshouxu/gbdt/xgboost/lib::$HADOOP_HDFS_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server, DMLC_TRACKER_PORT=9092, DMLC_SERVER_CORES=1}
16/05/30 11:11:40 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000002
16/05/30 11:11:40 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode61.data.cluster:54536
16/05/30 11:11:40 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:43 WARN dmlc.ApplicationMaster: KILLED_EXCEEDED_PMEM
16/05/30 11:11:43 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 143 Diagnostics:Container [pid=40959,containerID=container_1462859744498_485365_01_000002] is running beyond virtual memory limits. Current usage: 76.1 MB of 2 GB physical memory used; 31.8 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462859744498_485365_01_000002 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 40959 33704 40959 40959 (bash) 0 0 108699648 268 /bin/bash -c ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model 1>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000002/stdout 2>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000002/stderr
|- 40981 40967 40959 40959 (xgboost) 158 7 33956786176 18112 ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
|- 40967 40959 40959 40959 (python) 2 0 131223552 1107 python ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/05/30 11:11:43 INFO dmlc.ApplicationMaster: Task 0 failed on container_1462859744498_485365_01_000002. See LOG at : http://datanode61.data.cluster:8042/node/containerlogs/container_1462859744498_485365_01_000002/yarn
16/05/30 11:11:43 INFO impl.NMClientAsyncImpl: Processing Event EventType: STOP_CONTAINER for Container container_1462859744498_485365_01_000002
16/05/30 11:11:43 INFO dmlc.ApplicationMaster: onContainerStopped Invoked
16/05/30 11:11:45 INFO impl.AMRMClientImpl: Received new token for : datanode16.data.cluster:50800
16/05/30 11:11:45 INFO impl.AMRMClientImpl: Received new token for : datanode68.data.cluster:19652
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/launcher.py" } size: 2696 timestamp: 1464577893786 type: FILE visibility: APPLICATION, xgboost=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/xgboost" } size: 1947053 timestamp: 1464577893921 type: FILE visibility: APPLICATION, libstdc++.so.6=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/libstdc++.so.6" } size: 6469571 timestamp: 1464577894125 type: FILE visibility: APPLICATION, mushroom.aws.conf=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/mushroom.aws.conf" } size: 855 timestamp: 1464577894243 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/dmlc-yarn.jar" } size: 21292 timestamp: 1464577894380 type: FILE visibility: APPLICATION}
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: {DMLC_NODE_HOST=datanode16.data.cluster, DMLC_ROLE=worker, DMLC_TASK_ID=0, CLASSPATH=${CLASSPATH}:./:/data/sysdir/hadoop-2.4.1/etc/hadoop:
/data/sysdir/hadoop-2.4.1/share/hadoop/common//:
/data/sysdir/hadoop-2.4.1/share/hadoop/common/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/yarn//:/data/sysdir/hadoop-2.4.1/share/hadoop/yarn/lib/, DMLC_NUM_WORKER=1, PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_ATTEMPT=1, DMLC_NUM_SERVER=0, DMLC_SERVER_MEMORY_MB=1024, DMLC_JOB_ARCHIVES=, DMLC_TRACKER_URI=192.168.201.152, DMLC_JOB_CLUSTER=yarn, DMLC_WORKER_MEMORY_MB=1024, DMLC_WORKER_CORES=1, LD_LIBRARY_PATH=/data/home/tangshouxu/gbdt/xgboost/lib::$HADOOP_HDFS_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server, DMLC_TRACKER_PORT=9092, DMLC_SERVER_CORES=1}
16/05/30 11:11:45 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000003
16/05/30 11:11:45 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode16.data.cluster:50800
16/05/30 11:11:45 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000004
16/05/30 11:11:45 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode68.data.cluster:19652
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:49 WARN dmlc.ApplicationMaster: KILLED_EXCEEDED_PMEM
16/05/30 11:11:49 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 143 Diagnostics:Container [pid=83953,containerID=container_1462859744498_485365_01_000003] is running beyond virtual memory limits. Current usage: 197.6 MB of 2 GB physical memory used; 31.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462859744498_485365_01_000003 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 83953 29156 83953 83953 (bash) 0 0 108650496 314 /bin/bash -c ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model 1>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000003/stdout 2>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000003/stderr
|- 83986 83966 83953 83953 (xgboost) 355 22 33988182016 49165 ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
|- 83966 83953 83953 83953 (python) 1 1 131186688 1107 python ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/05/30 11:11:49 INFO dmlc.ApplicationMaster: Task 0 failed on container_1462859744498_485365_01_000003. See LOG at : http://datanode16.data.cluster:8042/node/containerlogs/container_1462859744498_485365_01_000003/yarn
16/05/30 11:11:49 INFO impl.NMClientAsyncImpl: Processing Event EventType: STOP_CONTAINER for Container container_1462859744498_485365_01_000003
16/05/30 11:11:49 INFO dmlc.ApplicationMaster: onContainerStopped Invoked
16/05/30 11:11:51 INFO impl.AMRMClientImpl: Received new token for : datanode123.data.cluster:60096
16/05/30 11:11:51 INFO impl.AMRMClientImpl: Received new token for : datanode136.data.cluster:19129
16/05/30 11:11:51 INFO impl.AMRMClientImpl: Received new token for : datanode96.data.cluster:40278
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/launcher.py" } size: 2696 timestamp: 1464577893786 type: FILE visibility: APPLICATION, xgboost=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/xgboost" } size: 1947053 timestamp: 1464577893921 type: FILE visibility: APPLICATION, libstdc++.so.6=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/libstdc++.so.6" } size: 6469571 timestamp: 1464577894125 type: FILE visibility: APPLICATION, mushroom.aws.conf=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/mushroom.aws.conf" } size: 855 timestamp: 1464577894243 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/dmlc-yarn.jar" } size: 21292 timestamp: 1464577894380 type: FILE visibility: APPLICATION}
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: {DMLC_NODE_HOST=datanode123.data.cluster, DMLC_ROLE=worker, DMLC_TASK_ID=0, CLASSPATH=${CLASSPATH}:./:/data/sysdir/hadoop-2.4.1/etc/hadoop:
/data/sysdir/hadoop-2.4.1/share/hadoop/common//:
/data/sysdir/hadoop-2.4.1/share/hadoop/common/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/yarn//:/data/sysdir/hadoop-2.4.1/share/hadoop/yarn/lib/, DMLC_NUM_WORKER=1, PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_ATTEMPT=2, DMLC_NUM_SERVER=0, DMLC_SERVER_MEMORY_MB=1024, DMLC_JOB_ARCHIVES=, DMLC_TRACKER_URI=192.168.201.152, DMLC_JOB_CLUSTER=yarn, DMLC_WORKER_MEMORY_MB=1024, DMLC_WORKER_CORES=1, LD_LIBRARY_PATH=/data/home/tangshouxu/gbdt/xgboost/lib::$HADOOP_HDFS_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server, DMLC_TRACKER_PORT=9092, DMLC_SERVER_CORES=1}
16/05/30 11:11:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000005
16/05/30 11:11:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode123.data.cluster:60096
16/05/30 11:11:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000006
16/05/30 11:11:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode136.data.cluster:19129
16/05/30 11:11:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000007
16/05/30 11:11:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode96.data.cluster:40278
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:56 WARN dmlc.ApplicationMaster: KILLED_EXCEEDED_PMEM
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 143 Diagnostics:Container [pid=137020,containerID=container_1462859744498_485365_01_000005] is running beyond virtual memory limits. Current usage: 117.9 MB of 2 GB physical memory used; 31.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462859744498_485365_01_000005 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 137052 137028 137020 137020 (xgboost) 243 14 33985527808 28756 ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
|- 137020 137486 137020 137020 (bash) 0 0 108654592 312 /bin/bash -c ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model 1>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000005/stdout 2>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000005/stderr
|- 137028 137020 137020 137020 (python) 2 0 131186688 1108 python ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: Task 0 failed on container_1462859744498_485365_01_000005. See LOG at : http://datanode123.data.cluster:8042/node/containerlogs/container_1462859744498_485365_01_000005/yarn
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: [DMLC] Task 0 failed more than 3times
16/05/30 11:11:56 INFO impl.NMClientAsyncImpl: Processing Event EventType: STOP_CONTAINER for Container container_1462859744498_485365_01_000005
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: onContainerStopped Invoked
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: Application completed. Stopping running containers
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode61.data.cluster:54536
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode16.data.cluster:50800
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode123.data.cluster:60096
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode68.data.cluster:19652
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode96.data.cluster:40278
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode136.data.cluster:19129
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: Diagnostics., num_tasks1, finished=0, failed=1
[DMLC] Task 0 failed more than 3times
16/05/30 11:11:56 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
Exception in thread "main" java.lang.Exception: Application not successful
at org.apache.hadoop.yarn.dmlc.ApplicationMaster.run(ApplicationMaster.java:290)
at org.apache.hadoop.yarn.dmlc.ApplicationMaster.main(ApplicationMaster.java:115)
The text was updated successfully, but these errors were encountered:
hi @tqchen
when we using xgboost with dmlc,there is some wrong with it. the training set is very very small,our hadoop version is 2.4.1
Diagnostics:Container [pid=40959,containerID=container_1462859744498_485365_01_000002] is running beyond virtual memory limits. Current usage: 76.1 MB of 2 GB physical memory used; 31.8 GB of 4.2 GB virtual memory used. Killing container.
error detail as follows:
Container: container_1462859744498_485365_01_000003 on datanode16.data.cluster_50800
LogType: stderr
LogLength: 655
Log Contents:
readDirect: FSDataInputStream#read error:
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:844)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:144)
[11:11:48] include/dmlc/logging.h:245: [11:11:48] src/io/hdfs_filesys.cc:44: HDFSStream.hdfsRead Error:Unknown error 255
terminate called after throwing an instance of 'dmlc::Error'
what(): [11:11:48] src/io/hdfs_filesys.cc:44: HDFSStream.hdfsRead Error:Unknown error 255
LogType: stdout
LogLength: 38
Log Contents:
[11:11:46] start datanode16.data.cluster:0
Container: container_1462859744498_485365_01_000001 on datanode46.data.cluster_48791
LogType: stderr
LogLength: 17621
Log Contents:
16/05/30 11:11:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/30 11:11:37 INFO dmlc.ApplicationMaster: Start AM as user=yarn
16/05/30 11:11:38 INFO dmlc.ApplicationMaster: Try to start 0 Servers and 1 Workers
16/05/30 11:11:38 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager.data.cluster/192.168.201.53:8030
16/05/30 11:11:38 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
16/05/30 11:11:38 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
16/05/30 11:11:38 INFO dmlc.ApplicationMaster: [DMLC] ApplicationMaster started
16/05/30 11:11:40 INFO impl.AMRMClientImpl: Received new token for : datanode61.data.cluster:54536
16/05/30 11:11:40 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/launcher.py" } size: 2696 timestamp: 1464577893786 type: FILE visibility: APPLICATION, xgboost=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/xgboost" } size: 1947053 timestamp: 1464577893921 type: FILE visibility: APPLICATION, libstdc++.so.6=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/libstdc++.so.6" } size: 6469571 timestamp: 1464577894125 type: FILE visibility: APPLICATION, mushroom.aws.conf=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/mushroom.aws.conf" } size: 855 timestamp: 1464577894243 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/dmlc-yarn.jar" } size: 21292 timestamp: 1464577894380 type: FILE visibility: APPLICATION}
16/05/30 11:11:40 INFO dmlc.ApplicationMaster: {DMLC_NODE_HOST=datanode61.data.cluster, DMLC_ROLE=worker, DMLC_TASK_ID=0, CLASSPATH=${CLASSPATH}:./:/data/sysdir/hadoop-2.4.1/etc/hadoop:
/data/sysdir/hadoop-2.4.1/share/hadoop/common//:
/data/sysdir/hadoop-2.4.1/share/hadoop/common/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/yarn//:/data/sysdir/hadoop-2.4.1/share/hadoop/yarn/lib/, DMLC_NUM_WORKER=1, PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_ATTEMPT=0, DMLC_NUM_SERVER=0, DMLC_SERVER_MEMORY_MB=1024, DMLC_JOB_ARCHIVES=, DMLC_TRACKER_URI=192.168.201.152, DMLC_JOB_CLUSTER=yarn, DMLC_WORKER_MEMORY_MB=1024, DMLC_WORKER_CORES=1, LD_LIBRARY_PATH=/data/home/tangshouxu/gbdt/xgboost/lib::$HADOOP_HDFS_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server, DMLC_TRACKER_PORT=9092, DMLC_SERVER_CORES=1}
16/05/30 11:11:40 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000002
16/05/30 11:11:40 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode61.data.cluster:54536
16/05/30 11:11:40 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:43 WARN dmlc.ApplicationMaster: KILLED_EXCEEDED_PMEM
16/05/30 11:11:43 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 143 Diagnostics:Container [pid=40959,containerID=container_1462859744498_485365_01_000002] is running beyond virtual memory limits. Current usage: 76.1 MB of 2 GB physical memory used; 31.8 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462859744498_485365_01_000002 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 40959 33704 40959 40959 (bash) 0 0 108699648 268 /bin/bash -c ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model 1>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000002/stdout 2>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000002/stderr
|- 40981 40967 40959 40959 (xgboost) 158 7 33956786176 18112 ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
|- 40967 40959 40959 40959 (python) 2 0 131223552 1107 python ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/05/30 11:11:43 INFO dmlc.ApplicationMaster: Task 0 failed on container_1462859744498_485365_01_000002. See LOG at : http://datanode61.data.cluster:8042/node/containerlogs/container_1462859744498_485365_01_000002/yarn
16/05/30 11:11:43 INFO impl.NMClientAsyncImpl: Processing Event EventType: STOP_CONTAINER for Container container_1462859744498_485365_01_000002
16/05/30 11:11:43 INFO dmlc.ApplicationMaster: onContainerStopped Invoked
16/05/30 11:11:45 INFO impl.AMRMClientImpl: Received new token for : datanode16.data.cluster:50800
16/05/30 11:11:45 INFO impl.AMRMClientImpl: Received new token for : datanode68.data.cluster:19652
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/launcher.py" } size: 2696 timestamp: 1464577893786 type: FILE visibility: APPLICATION, xgboost=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/xgboost" } size: 1947053 timestamp: 1464577893921 type: FILE visibility: APPLICATION, libstdc++.so.6=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/libstdc++.so.6" } size: 6469571 timestamp: 1464577894125 type: FILE visibility: APPLICATION, mushroom.aws.conf=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/mushroom.aws.conf" } size: 855 timestamp: 1464577894243 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/dmlc-yarn.jar" } size: 21292 timestamp: 1464577894380 type: FILE visibility: APPLICATION}
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: {DMLC_NODE_HOST=datanode16.data.cluster, DMLC_ROLE=worker, DMLC_TASK_ID=0, CLASSPATH=${CLASSPATH}:./:/data/sysdir/hadoop-2.4.1/etc/hadoop:
/data/sysdir/hadoop-2.4.1/share/hadoop/common//:
/data/sysdir/hadoop-2.4.1/share/hadoop/common/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/yarn//:/data/sysdir/hadoop-2.4.1/share/hadoop/yarn/lib/, DMLC_NUM_WORKER=1, PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_ATTEMPT=1, DMLC_NUM_SERVER=0, DMLC_SERVER_MEMORY_MB=1024, DMLC_JOB_ARCHIVES=, DMLC_TRACKER_URI=192.168.201.152, DMLC_JOB_CLUSTER=yarn, DMLC_WORKER_MEMORY_MB=1024, DMLC_WORKER_CORES=1, LD_LIBRARY_PATH=/data/home/tangshouxu/gbdt/xgboost/lib::$HADOOP_HDFS_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server, DMLC_TRACKER_PORT=9092, DMLC_SERVER_CORES=1}
16/05/30 11:11:45 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000003
16/05/30 11:11:45 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode16.data.cluster:50800
16/05/30 11:11:45 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000004
16/05/30 11:11:45 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode68.data.cluster:19652
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:45 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:49 WARN dmlc.ApplicationMaster: KILLED_EXCEEDED_PMEM
16/05/30 11:11:49 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 143 Diagnostics:Container [pid=83953,containerID=container_1462859744498_485365_01_000003] is running beyond virtual memory limits. Current usage: 197.6 MB of 2 GB physical memory used; 31.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462859744498_485365_01_000003 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 83953 29156 83953 83953 (bash) 0 0 108650496 314 /bin/bash -c ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model 1>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000003/stdout 2>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000003/stderr
|- 83986 83966 83953 83953 (xgboost) 355 22 33988182016 49165 ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
|- 83966 83953 83953 83953 (python) 1 1 131186688 1107 python ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/05/30 11:11:49 INFO dmlc.ApplicationMaster: Task 0 failed on container_1462859744498_485365_01_000003. See LOG at : http://datanode16.data.cluster:8042/node/containerlogs/container_1462859744498_485365_01_000003/yarn
16/05/30 11:11:49 INFO impl.NMClientAsyncImpl: Processing Event EventType: STOP_CONTAINER for Container container_1462859744498_485365_01_000003
16/05/30 11:11:49 INFO dmlc.ApplicationMaster: onContainerStopped Invoked
16/05/30 11:11:51 INFO impl.AMRMClientImpl: Received new token for : datanode123.data.cluster:60096
16/05/30 11:11:51 INFO impl.AMRMClientImpl: Received new token for : datanode136.data.cluster:19129
16/05/30 11:11:51 INFO impl.AMRMClientImpl: Received new token for : datanode96.data.cluster:40278
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/launcher.py" } size: 2696 timestamp: 1464577893786 type: FILE visibility: APPLICATION, xgboost=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/xgboost" } size: 1947053 timestamp: 1464577893921 type: FILE visibility: APPLICATION, libstdc++.so.6=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/libstdc++.so.6" } size: 6469571 timestamp: 1464577894125 type: FILE visibility: APPLICATION, mushroom.aws.conf=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/mushroom.aws.conf" } size: 855 timestamp: 1464577894243 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1462859744498_485365/dmlc-yarn.jar" } size: 21292 timestamp: 1464577894380 type: FILE visibility: APPLICATION}
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: {DMLC_NODE_HOST=datanode123.data.cluster, DMLC_ROLE=worker, DMLC_TASK_ID=0, CLASSPATH=${CLASSPATH}:./:/data/sysdir/hadoop-2.4.1/etc/hadoop:
/data/sysdir/hadoop-2.4.1/share/hadoop/common//:
/data/sysdir/hadoop-2.4.1/share/hadoop/common/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs//:
/data/sysdir/hadoop-2.4.1/share/hadoop/hdfs/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce//:
/data/sysdir/hadoop-2.4.1/share/hadoop/mapreduce/lib//:
/data/sysdir/hadoop-2.4.1/share/hadoop/yarn//:/data/sysdir/hadoop-2.4.1/share/hadoop/yarn/lib/, DMLC_NUM_WORKER=1, PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_ATTEMPT=2, DMLC_NUM_SERVER=0, DMLC_SERVER_MEMORY_MB=1024, DMLC_JOB_ARCHIVES=, DMLC_TRACKER_URI=192.168.201.152, DMLC_JOB_CLUSTER=yarn, DMLC_WORKER_MEMORY_MB=1024, DMLC_WORKER_CORES=1, LD_LIBRARY_PATH=/data/home/tangshouxu/gbdt/xgboost/lib::$HADOOP_HDFS_HOME/lib/native:$JAVA_HOME/jre/lib/amd64/server, DMLC_TRACKER_PORT=9092, DMLC_SERVER_CORES=1}
16/05/30 11:11:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000005
16/05/30 11:11:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode123.data.cluster:60096
16/05/30 11:11:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000006
16/05/30 11:11:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode136.data.cluster:19129
16/05/30 11:11:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1462859744498_485365_01_000007
16/05/30 11:11:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode96.data.cluster:40278
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:51 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
16/05/30 11:11:56 WARN dmlc.ApplicationMaster: KILLED_EXCEEDED_PMEM
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 143 Diagnostics:Container [pid=137020,containerID=container_1462859744498_485365_01_000005] is running beyond virtual memory limits. Current usage: 117.9 MB of 2 GB physical memory used; 31.9 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1462859744498_485365_01_000005 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 137052 137028 137020 137020 (xgboost) 243 14 33985527808 28756 ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
|- 137020 137486 137020 137020 (bash) 0 0 108654592 312 /bin/bash -c ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model 1>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000005/stdout 2>/data/sysdir/hadoop-2.4.1/logs/userlogs/application_1462859744498_485365/container_1462859744498_485365_01_000005/stderr
|- 137028 137020 137020 137020 (python) 2 0 131186688 1108 python ./launcher.py ./xgboost ./mushroom.aws.conf nthread=2 data=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/train/agaricus.txt.train eval[test]=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/test/agaricus.txt.test model_dir=hdfs://AutoclusterCluster/dmp/person/shouxu/gbdt/model
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: Task 0 failed on container_1462859744498_485365_01_000005. See LOG at : http://datanode123.data.cluster:8042/node/containerlogs/container_1462859744498_485365_01_000005/yarn
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: [DMLC] Task 0 failed more than 3times
16/05/30 11:11:56 INFO impl.NMClientAsyncImpl: Processing Event EventType: STOP_CONTAINER for Container container_1462859744498_485365_01_000005
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: onContainerStopped Invoked
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: Application completed. Stopping running containers
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode61.data.cluster:54536
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode16.data.cluster:50800
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode123.data.cluster:60096
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode68.data.cluster:19652
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode96.data.cluster:40278
16/05/30 11:11:56 INFO impl.ContainerManagementProtocolProxy: Closing proxy : datanode136.data.cluster:19129
16/05/30 11:11:56 INFO dmlc.ApplicationMaster: Diagnostics., num_tasks1, finished=0, failed=1
[DMLC] Task 0 failed more than 3times
16/05/30 11:11:56 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
Exception in thread "main" java.lang.Exception: Application not successful
at org.apache.hadoop.yarn.dmlc.ApplicationMaster.run(ApplicationMaster.java:290)
at org.apache.hadoop.yarn.dmlc.ApplicationMaster.main(ApplicationMaster.java:115)
The text was updated successfully, but these errors were encountered: