New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dl4j java.lang.RuntimeException: Can't allocate [HOST] memory && java.lang.OutOfMemoryError: Physical memory usage is too high #4335
Comments
unfortunately, when not using workspace and ParallelWrapper, it starts to train, but after 274 iterations, another error come out "java.lang.OutOfMemoryError: Cannot allocate new FloatPointer(1): totalBytes = 257, physicalBytes = 10G" log output as below:17:27:06.827 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend |
Increase available memory with the java -Xmx command line option, or we can tune that parameter more finely with the "org.bytedeco.javacpp.maxphysicalbytes" system property. |
i've tried that, here is my java options: |
Memory use should be a lot lower after this PR: #4900 |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Issue Description
Please describe our issue, along with:
i'm training a rnn on a 4-gpu computer, but got "java.lang.RuntimeException: Can't allocate [HOST] memory: 1997324; threadId: 35" error when using workspace (if don't use workspace, it works fine).
Environment Information
cpu: 2 * 8cores
memory: 64G
gpu: 4 * GeForce GTX 1080 Ti (11G ram each)
Version Information
jdk version: openjdk version "1.8.0_151"
dl4j version: 0.9.1
os: Ubuntu 16.04.3 LTS
cuda version: 8.0
NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.98 Thu Oct 26 15:16:01 PDT 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
pom file
(there are some dependency on spark because i can choose to train the network on spark or gpus)
<dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.binary.version}</artifactId> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.binary.version}</artifactId> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.binary.version}</artifactId> </dependency> <dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-core</artifactId> </dependency> <dependency> <groupId>org.deeplearning4j</groupId> <artifactId>dl4j-spark_${scala.binary.version}</artifactId> </dependency> <dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-parallel-wrapper_${scala.binary.version}</artifactId> </dependency> <dependency> <groupId>org.nd4j</groupId> <artifactId>nd4j-kryo_${scala.binary.version}</artifactId> </dependency> <dependency> <groupId>org.datavec</groupId> <artifactId>datavec-api</artifactId> </dependency> <dependency> <groupId>org.datavec</groupId> <artifactId>datavec-spark_${scala.binary.version}</artifactId> </dependency> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </dependency> </dependencies> <profiles> <profile> <id>use_cpu</id> <activation> <activeByDefault>true</activeByDefault> </activation> <dependencies> <dependency> <groupId>org.nd4j</groupId> <artifactId>nd4j-native-platform</artifactId> </dependency> </dependencies> </profile> <profile> <id>use_gpu</id> <activation> <activeByDefault>false</activeByDefault> </activation> <dependencies> <dependency> <groupId>org.nd4j</groupId> <artifactId>nd4j-cuda-${cuda.version}-platform</artifactId> </dependency> </dependencies> </profile> </profiles>
code
(network: 2 hidden layers, each with 80 hidden neurons, batch size=10, input feature size=1011, time series length=from 10 to 100, output=1011, tbptt length=10)
`
Nd4j.setDataType(DataBuffer.Type.HALF);
// DataTypeUtil.setDTypeForContext(DataBuffer.Type.HALF);
// CudaEnvironment.getInstance().getConfiguration()
// .allowMultiGPU(true)
// .setMaximumDeviceCacheableLength(1024 * 1024 * 1024L)
// .setMaximumDeviceCache(6L * 1024 * 1024 * 1024L)
// .setMaximumHostCacheableLength(1024 * 1024 * 1024L)
// .setMaximumHostCache(6L * 1024 * 1024 * 1024L)
// .allowCrossDeviceAccess(true);
long st = System.currentTimeMillis();
ItemSeqIterator train_data = dataloader.prepareData(params.train_data_file, params.batch_size, params.augment_sample);
logger.info("get {} training data, cost {} seconds", train_data.numExamples(), (System.currentTimeMillis()-st)/1000);
st = System.currentTimeMillis();
ItemSeqIterator test_data = dataloader.prepareData(params.test_data_file, 1, 0);
logger.info("get {} test data, cost {} seconds", test_data.numExamples(), (System.currentTimeMillis()-st)/1000);
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.iterations(1)
.learningRate(params.learning_rate)
.trainingWorkspaceMode(WorkspaceMode.SEPARATE)
.inferenceWorkspaceMode(WorkspaceMode.SEPARATE)
.seed(rd_seed)
.regularization(true)
.l2(params.l2_norm_coff)
.weightInit(WeightInit.XAVIER)
.updater(Updater.RMSPROP)
.list()
.layer(0, new GravesLSTM.Builder()
.nIn(train_data.inputColumns())
.nOut(params.lstm_layer_size)
.activation(Activation.TANH)
.dropOut(params.dropout)
.build())
.layer(1, new GravesLSTM.Builder()
.nIn(params.lstm_layer_size)
.nOut(params.lstm_layer_size)
.activation(Activation.TANH)
.dropOut(params.dropout)
.build())
.layer(2, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
.activation(Activation.SOFTMAX)
.nIn(params.lstm_layer_size)
.nOut(train_data.totalOutcomes())
.build())
.backpropType(BackpropType.TruncatedBPTT)
.tBPTTForwardLength(params.tbptt_length)
.tBPTTBackwardLength(params.tbptt_length)
.pretrain(false)
.backprop(true)
.build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
logger.info("network has {} parameters", net.numParams());
net.setListeners(new ScoreIterationListener(1));
ParallelWrapper wrapper = new ParallelWrapper.Builder(net)
.prefetchBuffer(4)
.workers(4)
.averagingFrequency(1)
.reportScoreAfterAveraging(true)
.workspaceMode(WorkspaceMode.SEPARATE)
.build();
Nd4j.getMemoryManager().setAutoGcWindow(5000);
Nd4j.getMemoryManager().togglePeriodicGc(false);
logger.info("Starting training");
for (int i = 0; i < params.num_epochs; i++) {
st = System.currentTimeMillis();
logger.info("epoch {} start", i);
wrapper.fit(train_data);
// net.fit(train_data);
logger.info("epoch {} complete, cost {} seconds, start evalating", (System.currentTimeMillis() - st)/1000);
}
`
log
16:34:37.506 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
16:34:42.266 [main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for NativeOps: 32
16:34:43.068 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.074 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.076 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.078 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.080 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.081 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.083 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.085 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.086 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.088 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.090 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.091 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.094 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.095 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.097 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.099 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.101 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.102 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.104 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.106 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.107 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.109 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.111 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.112 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.114 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.116 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.117 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.120 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.121 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.123 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.125 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.126 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [0]...
16:34:43.484 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.491 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.494 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.497 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.500 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.503 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.505 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.508 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.511 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.514 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.517 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.519 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.523 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.525 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.528 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.530 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.533 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.535 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.539 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.542 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.545 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.548 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.551 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.554 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.556 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.559 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.562 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.565 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.568 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.570 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.573 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.576 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [1]...
16:34:43.921 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.925 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.926 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.928 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.929 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.931 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.932 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.933 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.935 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.936 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.938 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.939 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.941 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.942 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.944 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.945 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.947 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.948 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.949 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.951 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.952 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.954 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.955 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.956 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.958 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.959 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.961 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.963 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.964 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.965 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.967 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:43.968 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [2]...
16:34:44.284 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.288 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.289 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.290 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.292 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.293 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.295 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.296 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.298 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.299 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.300 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.302 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.304 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.305 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.307 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.308 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.309 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.311 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.312 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.314 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.315 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.317 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.318 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.319 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.321 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.322 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.324 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.326 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.327 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.329 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.330 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.331 [main] DEBUG o.n.j.a.c.impl.BasicContextPool - Creating new stream for thread: [1], device: [3]...
16:34:44.336 [main] DEBUG o.n.j.c.CudaAffinityManager - Mapping thread [1] to device [0], out of [4] devices...
16:34:44.336 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [25] to device [0], out of [4] devices...
16:34:44.337 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [26] to device [0], out of [4] devices...
16:34:44.337 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [27] to device [0], out of [4] devices...
16:34:44.337 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [28] to device [0], out of [4] devices...
16:34:44.337 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [29] to device [0], out of [4] devices...
16:34:44.337 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [30] to device [0], out of [4] devices...
16:34:44.365 [main] DEBUG org.reflections.Reflections - going to scan these urls:
jar:file:/data/lib/nd4j-native-0.9.1.jar!/
jar:file:/data/lib/nd4j-native-0.9.1-linux-x86_64.jar!/
jar:file:/data/lib/nd4j-api-0.9.1.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1-linux-ppc64le.jar!/
jar:file:/data/lib/nd4j-jackson-0.9.1.jar!/
jar:file:/data/lib/nd4j-parameter-server-model-0.9.1.jar!/
jar:file:/data/lib/nd4j-base64-0.9.1.jar!/
jar:file:/data/lib/nd4j-native-0.9.1-macosx-x86_64.jar!/
jar:file:/data/lib/nd4j-context-0.9.1.jar!/
jar:file:/data/lib/jackson-0.9.1.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1-linux-x86_64.jar!/
jar:file:/data/lib/nd4j-aeron-0.9.1.jar!/
jar:file:/data/lib/nd4j-kryo_2.11-0.9.1.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1-windows-x86_64.jar!/
jar:file:/data/lib/nd4j-parameter-server-client-0.9.1.jar!/
jar:file:/data/lib/nd4j-common-0.9.1.jar!/
jar:file:/data/lib/nd4j-native-api-0.9.1.jar!/
jar:file:/data/lib/nd4j-buffer-0.9.1.jar!/
jar:file:/data/lib/nd4j-native-0.9.1-linux-ppc64le.jar!/
jar:file:/data/lib/nd4j-native-0.9.1-windows-x86_64.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1-macosx-x86_64.jar!/
jar:file:/data/lib/nd4j-parameter-server-0.9.1.jar!/
16:34:44.480 [main] INFO org.reflections.Reflections - Reflections took 112 ms to scan 23 urls, producing 31 keys and 227 values
16:34:44.605 [main] INFO o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Linux]
16:34:44.605 [main] INFO o.n.l.a.o.e.DefaultOpExecutioner - Cores: [16]; Memory: [7.1GB];
16:34:44.605 [main] INFO o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [CUBLAS]
16:34:44.607 [main] INFO o.n.l.j.o.e.CudaExecutioner - Device name: [GeForce GTX 1080 Ti]; CC: [6.1]; Total/free memory: [11712987136]
16:34:44.607 [main] INFO o.n.l.j.o.e.CudaExecutioner - Device name: [GeForce GTX 1080 Ti]; CC: [6.1]; Total/free memory: [11715084288]
16:34:44.607 [main] INFO o.n.l.j.o.e.CudaExecutioner - Device name: [GeForce GTX 1080 Ti]; CC: [6.1]; Total/free memory: [11715084288]
16:34:44.607 [main] INFO o.n.l.j.o.e.CudaExecutioner - Device name: [GeForce GTX 1080 Ti]; CC: [6.1]; Total/free memory: [11715084288]
16:36:00.970 [main] INFO c.a.r.m.itemseqrnn.TrainByLocalFile - get 6123748 training data, cost 76 seconds
16:36:01.084 [main] INFO c.a.r.m.itemseqrnn.TrainByLocalFile - get 30969 test data, cost 0 seconds
16:36:21.949 [main] DEBUG o.n.j.handler.impl.CudaZeroHandler - Creating bucketID: 1
16:36:22.173 [main] DEBUG org.reflections.Reflections - going to scan these urls:
file:/data/lib/scala-java8-compat_2.11-0.3.0.jar
file:/data/lib/nd4j-parameter-server-model-0.9.1.jar
file:/data/lib/commons-cli-1.2.jar
file:/data/lib/scala-stm_2.11-0.7.jar
file:/data/lib/snappy-0.2.jar
file:/data/lib/nd4j-cuda-8.0-0.9.1-macosx-x86_64.jar
file:/data/lib/play-functional_2.11-2.4.6.jar
file:/data/lib/jersey-container-servlet-core-2.22.2.jar
file:/data/lib/mapdb-3.0.5.jar
file:/data/lib/api-util-1.0.0-M20.jar
file:/data/lib/parquet-generator-1.7.0.jar
file:/data/lib/scala-library-2.11.8.jar
file:/data/lib/commons-beanutils-core-1.8.0.jar
file:/data/lib/parquet-column-1.7.0.jar
file:/data/lib/hdf5-1.10.0-patch1-1.3-macosx-x86_64.jar
file:/data/lib/jackson-databind-2.6.5.jar
file:/data/lib/commons-codec-1.10.jar
file:/data/lib/nd4j-native-0.9.1-linux-ppc64le.jar
file:/data/lib/aopalliance-1.0.jar
file:/data/lib/derby-10.10.2.0.jar
file:/data/lib/play-netty-utils-2.4.6.jar
file:/data/lib/datanucleus-api-jdo-3.2.6.jar
file:/data/lib/jodd-core-3.5.2.jar
file:/data/lib/avro-ipc-1.7.7-tests.jar
file:/data/lib/openblas-0.2.19-1.3-android-x86.jar
file:/data/lib/htrace-core-3.1.0-incubating.jar
file:/data/lib/slf4j-log4j12-1.7.16.jar
file:/data/lib/leptonica-1.73-1.3-linux-x86_64.jar
file:/data/lib/netty-3.8.0.Final.jar
file:/data/lib/scala-reflect-2.11.7.jar
file:/data/lib/leveldb-api-0.5.jar
file:/data/lib/elsa-3.0.0-M5.jar
file:/data/lib/janino-2.7.8.jar
file:/data/lib/joda-convert-1.7.jar
file:/data/lib/cuda-8.0-6.0-1.3.jar
file:/data/lib/leptonica-1.73-1.3-linux-x86.jar
file:/data/lib/joni-2.1.2.jar
file:/data/lib/RoaringBitmap-0.5.11.jar
file:/data/lib/leptonica-1.73-1.3-android-arm.jar
file:/data/lib/opencv-3.2.0-1.3-linux-x86_64.jar
file:/data/lib/pyrolite-4.9.jar
file:/data/lib/hibernate-validator-5.0.3.Final.jar
file:/data/lib/scala-compiler-2.11.0.jar
file:/data/lib/leptonica-1.73-1.3-linux-ppc64le.jar
file:/data/lib/deeplearning4j-nn-0.9.1.jar
file:/data/lib/guice-assistedinject-4.0.jar
file:/data/lib/findbugs-annotations-1.3.9-1.jar
file:/data/lib/jersey-media-jaxb-2.22.2.jar
file:/data/lib/akka-actor_2.11-2.3.13.jar
file:/data/lib/jtransforms-2.4.0.jar
file:/data/lib/hbase-protocol-1.2.5.jar
file:/data/lib/imageio-bmp-3.1.1.jar
file:/data/lib/jaxb-core-2.2.7.jar
file:/data/lib/c3p0-0.9.5.2.jar
file:/data/lib/commons-collections-3.2.1.jar
file:/data/lib/compress-lzf-1.0.3.jar
file:/data/lib/openblas-0.2.19-1.3-linux-x86.jar
file:/data/lib/logback-core-1.1.3.jar
file:/data/lib/cuda-8.0-6.0-1.3-linux-x86_64.jar
file:/data/lib/javax.annotation-api-1.2.jar
file:/data/lib/httpcore-nio-4.4.4.jar
file:/data/lib/zookeeper-3.4.5.jar
file:/data/lib/bonecp-0.8.0.RELEASE.jar
file:/data/lib/ffmpeg-3.2.1-1.3.jar
file:/data/lib/aopalliance-repackaged-2.4.0-b34.jar
file:/data/lib/datavec-spark_2.11-0.9.1_spark_2.jar
file:/data/lib/bson-3.5.0.jar
file:/data/lib/ivy-2.4.0.jar
file:/data/lib/calcite-core-1.2.0-incubating.jar
file:/data/lib/opencv-3.2.0-1.3-windows-x86.jar
file:/data/lib/breeze_2.11-0.11.2.jar
file:/data/lib/antlr-2.7.7.jar
file:/data/lib/commons-configuration-1.6.jar
file:/data/lib/hk2-locator-2.4.0-b34.jar
file:/data/lib/imageio-psd-3.1.1.jar
file:/data/lib/leveldb-0.5.jar
file:/data/lib/JavaEWAH-0.3.2.jar
file:/data/lib/openblas-0.2.19-1.3-linux-x86_64.jar
file:/data/lib/opencv-platform-3.2.0-1.3.jar
file:/data/lib/opencv-3.2.0-1.3-linux-x86.jar
file:/data/lib/kryo-4.0.0.jar
file:/data/lib/classmate-1.0.0.jar
file:/data/lib/opencsv-2.3.jar
file:/data/lib/spring-core-4.1.6.RELEASE.jar
file:/data/lib/deeplearning4j-ui-components-0.9.1.jar
file:/data/lib/commons-digester-1.8.jar
file:/data/lib/parquet-hadoop-bundle-1.6.0.jar
file:/data/lib/jsr305-1.3.9.jar
file:/data/lib/nd4j-cuda-8.0-0.9.1-windows-x86_64.jar
file:/data/lib/jackson-datatype-jsr310-2.4.4.jar
file:/data/lib/json-20090211.jar
file:/data/lib/stream-2.7.0.jar
file:/data/lib/deeplearning4j-core-0.9.1.jar
file:/data/lib/commons-lang-2.6.jar
file:/data/lib/artoolkitplus-2.3.1-1.3.jar
file:/data/lib/unused-1.0.0.jar
file:/data/lib/hk2-utils-2.4.0-b34.jar
file:/data/lib/deeplearning4j-modelimport-0.9.1.jar
file:/data/lib/hive-exec-1.2.1.spark2.jar
file:/data/lib/objenesis-2.2.jar
file:/data/lib/chill-java-0.8.0.jar
file:/data/lib/play-iteratees_2.11-2.4.6.jar
file:/data/lib/hbase-client-1.2.5.jar
file:/data/lib/nd4j-native-0.9.1-android-x86.jar
file:/data/lib/json4s-jackson_2.11-3.2.11.jar
file:/data/lib/lz4-1.3.0.jar
file:/data/lib/commons-httpclient-3.1.jar
file:/data/lib/univocity-parsers-2.1.1.jar
file:/data/lib/commons-collections-3.2.2.jar
file:/data/lib/leptonica-1.73-1.3-windows-x86_64.jar
file:/data/lib/parquet-format-2.3.0-incubating.jar
file:/data/lib/play-netty-server_2.11-2.4.6.jar
file:/data/lib/hbase-annotations-1.2.5.jar
file:/data/lib/akka-remote_2.11-2.3.13.jar
file:/data/lib/kotlin-runtime-1.0.7.jar
file:/data/lib/openblas-0.2.19-1.3-android-arm.jar
file:/data/lib/nd4j-native-api-0.9.1.jar
file:/data/lib/hdf5-1.10.0-patch1-1.3-linux-x86_64.jar
file:/data/lib/asm-5.0.4.jar
file:/data/lib/javacv-1.3.3.jar
file:/data/lib/nd4j-kryo_2.11-0.9.1.jar
file:/data/lib/datavec-api-0.9.1.jar
file:/data/lib/jai-imageio-core-1.3.0.jar
file:/data/lib/unirest-java-1.4.9.jar
file:/data/lib/kryo-shaded-3.0.3.jar
file:/data/lib/play-server_2.11-2.4.6.jar
file:/data/lib/metrics-json-3.1.2.jar
file:/data/lib/jcip-annotations-1.0.jar
file:/data/lib/leptonica-1.73-1.3-windows-x86.jar
file:/data/lib/nd4j-cuda-8.0-0.9.1.jar
file:/data/lib/javax.servlet-api-3.1.0.jar
file:/data/lib/scalap-2.11.0.jar
file:/data/lib/play_2.11-2.4.6.jar
file:/data/lib/netty-http-pipelining-1.1.4.jar
file:/data/lib/nd4j-native-0.9.1.jar
file:/data/lib/javax.inject-2.4.0-b34.jar
file:/data/lib/jackson-datatype-jdk8-2.4.4.jar
file:/data/lib/opencv-3.2.0-1.3-macosx-x86_64.jar
file:/data/lib/javax.ws.rs-api-2.0.1.jar
file:/data/lib/spire_2.11-0.7.4.jar
file:/data/lib/guice-4.0.jar
file:/data/lib/config-1.3.0.jar
file:/data/lib/antlr4-runtime-4.5.3.jar
file:/data/lib/jcl-over-slf4j-1.7.16.jar
file:/data/lib/kryo-serializers-0.41.jar
file:/data/lib/libfb303-0.9.2.jar
file:/data/lib/libdc1394-2.2.4-1.3.jar
file:/data/lib/opencv-3.2.0-1.3-android-arm.jar
file:/data/lib/jul-to-slf4j-1.7.16.jar
file:/data/lib/scala-xml_2.11-1.0.2.jar
file:/data/lib/metrics-graphite-3.1.2.jar
file:/data/lib/stax-api-1.0.1.jar
file:/data/lib/imageio-tiff-3.1.1.jar
file:/data/lib/hamcrest-core-1.3.jar
file:/data/lib/common-lang-3.1.1.jar
file:/data/lib/validation-api-1.1.0.Final.jar
file:/data/lib/junit-4.12.jar
file:/data/lib/pmml-model-1.2.15.jar
file:/data/lib/leptonica-1.73-1.3-macosx-x86_64.jar
file:/data/lib/httpcore-4.4.4.jar
file:/data/models/
file:/data/lib/akka-slf4j_2.11-2.3.13.jar
file:/data/lib/openblas-0.2.19-1.3-linux-ppc64le.jar
file:/data/lib/api-asn1-api-1.0.0-M20.jar
file:/data/lib/hdf5-1.10.0-patch1-1.3-linux-ppc64le.jar
file:/data/lib/hdf5-1.10.0-patch1-1.3-windows-x86.jar
file:/data/lib/datanucleus-core-3.2.10.jar
file:/data/lib/guice-3.0.jar
file:/data/lib/openblas-0.2.19-1.3.jar
file:/data/lib/cuda-8.0-6.0-1.3-macosx-x86_64.jar
file:/data/lib/eclipse-collections-7.1.1.jar
file:/data/lib/neoitertools-1.0.0.jar
file:/data/lib/jaxb-impl-2.2.7.jar
file:/data/lib/logback-classic-1.1.3.jar
file:/data/lib/jackson-0.9.1.jar
file:/data/lib/pmml-schema-1.2.15.jar
file:/data/lib/datavec-hadoop-0.9.1.jar
file:/data/lib/deeplearning4j-ui-model-0.9.1.jar
file:/data/lib/aeron-all-1.0.4.jar
file:/data/lib/nd4j-native-0.9.1-linux-x86_64.jar
file:/data/lib/pmml-agent-1.1.15.jar
file:/data/lib/imageio-core-3.1.1.jar
file:/data/lib/reflectasm-1.11.3.jar
file:/data/lib/minlog-1.3.0.jar
file:/data/lib/jackson-module-paranamer-2.6.5.jar
file:/data/lib/junit-4.8.2.jar
file:/data/lib/nd4j-native-0.9.1-macosx-x86_64.jar
file:/data/lib/tomcat-servlet-api-8.0.21.jar
file:/data/lib/jackson-module-scala_2.11-2.6.5.jar
file:/data/lib/jackson-core-2.6.5.jar
file:/data/lib/javolution-5.5.1.jar
file:/data/lib/hk2-api-2.4.0-b34.jar
file:/data/lib/kotlin-stdlib-1.0.7.jar
file:/data/lib/jackson-core-asl-1.9.13.jar
file:/data/lib/mesos-0.21.1-shaded-protobuf.jar
file:/data/lib/twirl-api_2.11-1.1.1.jar
file:/data/lib/deeplearning4j-parallel-wrapper_2.11-0.9.1.jar
file:/data/lib/imageio-metadata-3.1.1.jar
file:/data/lib/play-java_2.11-2.4.6.jar
file:/data/lib/uncommons-maths-1.2.2a.jar
file:/data/lib/jetty-util-6.1.26.jar
file:/data/lib/xercesImpl-2.11.0.jar
file:/data/lib/httpmime-4.5.2.jar
file:/data/lib/sqlite-jdbc-3.15.1.jar
file:/data/lib/jdo-api-3.0.1.jar
file:/data/lib/hdf5-1.10.0-patch1-1.3-windows-x86_64.jar
file:/data/lib/xz-1.5.jar
file:/data/lib/play-datacommons_2.11-2.4.6.jar
file:/data/lib/avro-mapred-1.7.7-hadoop2.jar
file:/data/lib/commons-logging-1.1.3.jar
file:/data/lib/commons-io-2.4.jar
file:/data/lib/hdf5-1.10.0-patch1-1.3-linux-x86.jar
file:/data/lib/openblas-0.2.19-1.3-windows-x86.jar
file:/data/lib/jets3t-0.7.1.jar
file:/data/lib/Agrona-0.5.4.jar
file:/data/lib/commons-net-2.2.jar
file:/data/lib/nd4j-buffer-0.9.1.jar
file:/data/lib/opencv-3.2.0-1.3.jar
file:/data/lib/nd4j-parameter-server-client-0.9.1.jar
file:/data/lib/parquet-jackson-1.7.0.jar
file:/data/lib/akka-contrib_2.11-2.3.13.jar
file:/data/lib/pmml-schema-1.1.15.jar
file:/data/lib/opencv-3.2.0-1.3-windows-x86_64.jar
file:/data/lib/nd4j-cuda-8.0-0.9.1-linux-ppc64le.jar
file:/data/lib/nd4j-jackson-0.9.1.jar
file:/data/lib/oro-2.0.8.jar
file:/data/lib/build-link-2.4.6.jar
file:/data/lib/jersey-client-2.22.2.jar
file:/data/lib/commons-dbcp-1.4.jar
file:/data/lib/protobuf-java-2.5.0.jar
file:/data/lib/curator-framework-2.4.0.jar
file:/data/lib/slf4j-api-1.7.25.jar
file:/data/lib/openblas-0.2.19-1.3-windows-x86_64.jar
file:/data/lib/json4s-core_2.11-3.2.11.jar
file:/data/lib/hive-metastore-1.2.1.spark2.jar
file:/data/lib/typetools-0.4.3.jar
file:/data/lib/common-io-3.1.1.jar
file:/data/lib/akka-persistence-experimental_2.11-2.3.13.jar
file:/data/lib/parquet-common-1.7.0.jar
file:/data/lib/jaxb-api-2.2.7.jar
file:/data/lib/stringtemplate-3.2.1.jar
file:/data/lib/leptonica-1.73-1.3.jar
file:/data/lib/commons-pool-1.5.4.jar
file:/data/lib/nearestneighbor-core-0.9.1.jar
file:/data/lib/libfreenect2-0.2.0-1.3.jar
file:/data/lib/curator-client-2.4.0.jar
file:/data/lib/librealsense-1.9.6-1.3.jar
file:/data/lib/javassist-3.19.0-GA.jar
file:/data/lib/openblas-platform-0.2.19-1.3.jar
file:/data/lib/chill_2.11-0.8.0.jar
file:/data/lib/netty-all-4.0.29.Final.jar
file:/data/lib/curator-recipes-2.4.0.jar
file:/data/lib/gson-2.8.1.jar
file:/data/lib/apache-log4j-extras-1.2.17.jar
file:/data/lib/cuda-8.0-6.0-1.3-windows-x86_64.jar
file:/data/lib/calcite-avatica-1.2.0-incubating.jar
file:/data/lib/jcodings-1.0.8.jar
file:/data/lib/metrics-core-3.1.2.jar
file:/data/lib/flandmark-1.07-1.3.jar
file:/data/lib/scala-parser-combinators_2.11-1.0.1.jar
file:/data/lib/spring-beans-4.1.6.RELEASE.jar
file:/data/lib/parquet-encoding-1.7.0.jar
file:/data/lib/leptonica-platform-1.73-1.3.jar
file:/data/lib/opencv-3.2.0-1.3-android-x86.jar
file:/data/lib/datanucleus-rdbms-3.2.9.jar
file:/data/lib/freemarker-2.3.23.jar
file:/data/lib/jboss-logging-3.2.1.Final.jar
file:/data/lib/avro-1.7.7.jar
file:/data/lib/jackson-annotations-2.6.5.jar
file:/data/lib/httpasyncclient-4.1.1.jar
file:/data/lib/videoinput-0.200-1.3.jar
file:/data/lib/guava-18.0.jar
file:/data/lib/metrics-jvm-3.1.2.jar
file:/data/models/al-rec-models-itemseq-1.0.jar
file:/data/lib/cuda-8.0-6.0-1.3-linux-ppc64le.jar
file:/data/lib/ST4-4.0.4.jar
file:/data/lib/jersey-container-servlet-2.22.2.jar
file:/data/lib/jersey-server-2.22.2.jar
file:/data/lib/jersey-common-2.22.2.jar
file:/data/lib/leptonica-1.73-1.3-linux-armhf.jar
file:/data/lib/apacheds-i18n-2.0.0-M15.jar
file:/data/lib/leptonica-1.73-1.3-android-x86.jar
file:/data/lib/commons-math-2.1.jar
file:/data/lib/eigenbase-properties-1.1.5.jar
file:/data/lib/commons-beanutils-1.7.0.jar
file:/data/lib/slf4j-log4j12-1.7.25.jar
file:/data/lib/snakeyaml-1.12.jar
file:/data/lib/snappy-java-1.1.2.6.jar
file:/data/lib/flycapture-2.9.3.43-1.3.jar
file:/data/lib/objenesis-2.1.jar
file:/data/lib/cuda-platform-8.0-6.0-1.3.jar
file:/data/lib/datavec-data-image-0.9.1.jar
file:/data/lib/nd4j-native-0.9.1-windows-x86_64.jar
file:/data/lib/nd4j-native-platform-0.9.1.jar
file:/data/lib/nd4j-base64-0.9.1.jar
file:/data/lib/nd4j-api-0.9.1.jar
file:/data/lib/calcite-linq4j-1.2.0-incubating.jar
file:/data/lib/avro-ipc-1.7.7.jar
file:/data/lib/nd4j-aeron-0.9.1.jar
file:/data/lib/libfreenect-0.5.3-1.3.jar
file:/data/lib/mysql-connector-java-6.0.6.jar
file:/data/lib/nd4j-native-0.9.1-android-arm.jar
file:/data/lib/core-1.1.2.jar
file:/data/lib/metrics-core-2.2.0.jar
file:/data/lib/openblas-0.2.19-1.3-macosx-x86_64.jar
file:/data/lib/xmlenc-0.52.jar
file:/data/lib/paranamer-2.3.jar
file:/data/lib/play-exceptions-2.4.6.jar
file:/data/lib/joda-time-2.9.3.jar
file:/data/lib/common-image-3.1.1.jar
file:/data/lib/apacheds-kerberos-codec-2.0.0-M15.jar
file:/data/lib/opencv-3.2.0-1.3-linux-ppc64le.jar
file:/data/lib/eclipse-collections-forkjoin-7.1.1.jar
file:/data/lib/hdf5-1.10.0-patch1-1.3.jar
file:/data/lib/akka-cluster_2.11-2.3.13.jar
file:/data/lib/nd4j-cuda-8.0-platform-0.9.1.jar
file:/data/lib/javacpp-1.3.3.jar
file:/data/lib/jta-1.1.jar
file:/data/lib/mongodb-driver-3.5.0.jar
file:/data/lib/hdf5-platform-1.10.0-patch1-1.3.jar
file:/data/lib/deeplearning4j-play_2.11-0.9.1.jar
file:/data/lib/commons-compress-1.8.jar
file:/data/lib/scalatest_2.11-2.2.6.jar
file:/data/lib/commons-compiler-2.7.6.jar
file:/data/lib/xbean-asm5-shaded-4.4.jar
file:/data/lib/hbase-common-1.2.5.jar
file:/data/lib/pmml-model-1.1.15.jar
file:/data/lib/reflections-0.9.10.jar
file:/data/lib/jcommander-1.27.jar
file:/data/lib/libthrift-0.9.2.jar
file:/data/lib/nd4j-parameter-server-0.9.1.jar
file:/data/lib/xml-apis-1.4.01.jar
file:/data/lib/commons-math3-3.4.1.jar
file:/data/lib/jersey-guava-2.22.2.jar
file:/data/lib/slf4j-api-1.7.16.jar
file:/data/lib/json4s-ast_2.11-3.2.11.jar
file:/data/lib/mchange-commons-java-0.2.11.jar
file:/data/lib/opencv-3.2.0-1.3-linux-armhf.jar
file:/data/lib/arpack_combined_all-0.1.jar
file:/data/lib/breeze-macros_2.11-0.11.2.jar
file:/data/lib/lombok-1.16.16.jar
file:/data/lib/c3p0-0.9.1.2.jar
file:/data/lib/leveldbjni-all-1.8.jar
file:/data/lib/imageio-jpeg-3.1.1.jar
file:/data/lib/antlr-runtime-3.4.jar
file:/data/lib/javassist-3.18.1-GA.jar
file:/data/lib/log4j-1.2.17.jar
file:/data/lib/stax2-api-3.1.4.jar
file:/data/lib/osgi-resource-locator-1.0.1.jar
file:/data/lib/py4j-0.10.3.jar
file:/data/lib/mongodb-driver-core-3.5.0.jar
file:/data/lib/nd4j-context-0.9.1.jar
file:/data/lib/nd4j-cuda-8.0-0.9.1-linux-x86_64.jar
file:/data/lib/httpclient-4.5.2.jar
file:/data/lib/play-json_2.11-2.4.6.jar
file:/data/lib/javax.inject-1.jar
file:/data/lib/spring-context-4.1.6.RELEASE.jar
file:/data/lib/spire-macros_2.11-0.7.4.jar
file:/data/lib/eclipse-collections-api-7.1.1.jar
file:/data/lib/openblas-0.2.19-1.3-linux-armhf.jar
file:/data/lib/al-rec-common-1.0.jar
file:/data/lib/nd4j-common-0.9.1.jar
file:/data/lib/parquet-hadoop-1.7.0.jar
file:/data/lib/dl4j-spark_2.11-0.9.1_spark_2.jar
file:/data/lib/commons-lang3-3.3.2.jar
file:/data/lib/annotations-2.0.1.jar
file:/data/lib/jackson-mapper-asl-1.9.13.jar
file:/data/lib/guava-20.0.jar
16:36:27.254 [main] INFO org.reflections.Reflections - Reflections took 5080 ms to scan 368 urls, producing 4712 keys and 36863 values
16:36:27.514 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.conf.layers.CenterLossOutputLayer as subtype of org.deeplearning4j.nn.conf.layers.Layer
16:36:27.514 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.modelimport.keras.preprocessors.TensorFlowCnnToFeedForwardPreProcessor as subtype of org.deeplearning4j.nn.conf.InputPreProcessor
16:36:27.515 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.conf.graph.ReshapeVertex as subtype of org.deeplearning4j.nn.conf.graph.GraphVertex
16:36:27.515 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.conf.graph.PoolHelperVertex as subtype of org.deeplearning4j.nn.conf.graph.GraphVertex
16:36:27.515 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.conf.graph.ShiftVertex as subtype of org.deeplearning4j.nn.conf.graph.GraphVertex
16:36:27.520 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.conf.layers.CenterLossOutputLayer as subtype of org.deeplearning4j.nn.conf.layers.Layer
16:36:27.521 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.modelimport.keras.preprocessors.TensorFlowCnnToFeedForwardPreProcessor as subtype of org.deeplearning4j.nn.conf.InputPreProcessor
16:36:27.521 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.conf.graph.ReshapeVertex as subtype of org.deeplearning4j.nn.conf.graph.GraphVertex
16:36:27.521 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.conf.graph.PoolHelperVertex as subtype of org.deeplearning4j.nn.conf.graph.GraphVertex
16:36:27.521 [main] DEBUG o.d.nn.conf.NeuralNetConfiguration - Registering class for JSON serialization: org.deeplearning4j.nn.conf.graph.ShiftVertex as subtype of org.deeplearning4j.nn.conf.graph.GraphVertex
16:36:27.550 [main] INFO o.d.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: SEPARATE; inference: SEPARATE]
16:36:47.167 [main] DEBUG o.n.j.handler.impl.CudaZeroHandler - Creating bucketID: 5
16:36:47.182 [main] DEBUG o.n.j.handler.impl.CudaZeroHandler - Creating bucketID: 3
16:36:47.214 [main] DEBUG org.reflections.Reflections - going to scan these urls:
jar:file:/data/lib/nd4j-native-0.9.1.jar!/
jar:file:/data/lib/nd4j-native-0.9.1-linux-x86_64.jar!/
jar:file:/data/lib/nd4j-api-0.9.1.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1-linux-ppc64le.jar!/
jar:file:/data/lib/nd4j-jackson-0.9.1.jar!/
jar:file:/data/lib/nd4j-parameter-server-model-0.9.1.jar!/
jar:file:/data/lib/nd4j-base64-0.9.1.jar!/
jar:file:/data/lib/nd4j-native-0.9.1-macosx-x86_64.jar!/
jar:file:/data/lib/nd4j-context-0.9.1.jar!/
jar:file:/data/lib/jackson-0.9.1.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1-linux-x86_64.jar!/
jar:file:/data/lib/nd4j-aeron-0.9.1.jar!/
jar:file:/data/lib/nd4j-kryo_2.11-0.9.1.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1-windows-x86_64.jar!/
jar:file:/data/lib/nd4j-parameter-server-client-0.9.1.jar!/
jar:file:/data/lib/nd4j-common-0.9.1.jar!/
jar:file:/data/lib/nd4j-native-api-0.9.1.jar!/
jar:file:/data/lib/nd4j-buffer-0.9.1.jar!/
jar:file:/data/lib/nd4j-native-0.9.1-linux-ppc64le.jar!/
jar:file:/data/lib/nd4j-native-0.9.1-windows-x86_64.jar!/
jar:file:/data/lib/nd4j-cuda-8.0-0.9.1-macosx-x86_64.jar!/
jar:file:/data/lib/nd4j-parameter-server-0.9.1.jar!/
16:37:07.698 [main] INFO org.reflections.Reflections - Reflections took 20484 ms to scan 23 urls, producing 420 keys and 1665 values
16:37:28.795 [main] DEBUG o.n.j.handler.impl.CudaZeroHandler - Creating bucketID: 2
16:37:28.814 [main] DEBUG o.n.j.handler.impl.CudaZeroHandler - Creating bucketID: 4
16:37:28.814 [main] DEBUG o.n.j.handler.impl.CudaZeroHandler - Creating bucketID: 0
16:37:28.859 [main] INFO c.a.r.m.itemseqrnn.TrainByLocalFile - network has 499331 parameters
16:37:28.877 [main] INFO o.d.parallelism.ParallelWrapper - Creating new AveragingTraining instance
16:37:28.878 [main] INFO c.a.r.m.itemseqrnn.TrainByLocalFile - Starting training
16:37:28.878 [main] INFO c.a.r.m.itemseqrnn.TrainByLocalFile - epoch 0 start
16:37:28.878 [main] INFO o.d.parallelism.ParallelWrapper - Using workspaceMode SEPARATE for training
16:37:28.883 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [33] to device [0], out of [4] devices...
16:37:28.883 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [35] to device [1], out of [4] devices...
16:37:28.884 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [37] to device [2], out of [4] devices...
16:37:28.885 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [39] to device [3], out of [4] devices...
16:37:28.885 [main] INFO o.d.parallelism.ParallelWrapper - Creating asynchronous prefetcher...
16:37:28.890 [main] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [40] to device [0], out of [4] devices...
16:37:28.890 [main] INFO o.d.parallelism.ParallelWrapper - Starting ParallelWrapper training round...
16:37:28.896 [ADSI prefetch thread] DEBUG o.n.l.memory.abstracts.Nd4jWorkspace - Steps: 10
16:37:28.912 [ADSI prefetch thread] DEBUG o.n.l.memory.abstracts.Nd4jWorkspace - Steps: 17
16:37:28.928 [ADSI prefetch thread] DEBUG o.n.l.memory.abstracts.Nd4jWorkspace - Steps: 17
16:37:28.943 [ADSI prefetch thread] DEBUG o.n.l.memory.abstracts.Nd4jWorkspace - Steps: 17
16:37:28.958 [ADSI prefetch thread] DEBUG o.n.l.memory.abstracts.Nd4jWorkspace - Steps: 17
CUDA error at /home/jenkins/workspace/dl4j/all-multiplatform@2_linux-x86_64/stream1/libnd4j/blas/cuda/NativeOps.cu:2172 code=77() "cudaStreamSynchronize(*stream)"
CUDA error at /home/jenkins/workspace/dl4j/all-multiplatform@2_linux-x86_64/stream1/libnd4j/blas/cuda/NativeOps.cu:4885 code=77() "result"
CUDA error at /home/jenkins/workspace/dl4j/all-multiplatform@2_linux-x86_64/stream1/libnd4j/blas/cuda/NativeOps.cu:4738 code=77() "result"
Exception in thread "ADSI prefetch thread" 16:37:29.020 [ParallelWrapper training thread 0] DEBUG o.d.p.trainer.DefaultTrainer - Terminating all workspaces for trainer_0
16:37:51.246 [ParallelWrapper training thread 0] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [41] to device [0], out of [4] devices...
CUDA error at /home/jenkins/workspace/dl4j/all-multiplatform@2_linux-x86_64/stream1/libnd4j/blas/cuda/NativeOps.cu:4895 code=77() "result"
CUDA error at /home/jenkins/workspace/dl4j/all-multiplatform@2_linux-x86_64/stream1/libnd4j/blas/cuda/NativeOps.cu:4895 code=77() "result"
CUDA error at /home/jenkins/workspace/dl4j/all-multiplatform@2_linux-x86_64/stream1/libnd4j/blas/cuda/NativeOps.cu:4895 code=77() "result"
CUDA error at /home/jenkins/workspace/dl4j/all-multiplatform@2_linux-x86_64/stream1/libnd4j/blas/cuda/NativeOps.cu:4895 code=77() "result"
CUDA error at /home/jenkins/workspace/dl4j/all-multiplatform@2_linux-x86_64/stream1/libnd4j/blas/cuda/NativeOps.cu:4895 code=77() "result"
Exception in thread "UniGC thread 5" Exception in thread "UniGC thread 1" Exception in thread "UniGC thread 4" Exception in thread "UniGC thread 3" Exception in thread "UniGC thread 2" org.nd4j.linalg.exception.ND4JException: CUDA exception happened. Terminating. Last op: [null]
at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:55)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:106)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:47)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:203)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:62)
at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1113)
at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:515)
at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:714)
CUDA error at /home/jenkins/workspace/dl4j/all-multiplatform@2_linux-x86_64/stream1/libnd4j/blas/cuda/NativeOps.cu:4895 code=77() "result"
16:37:51.249 [ParallelWrapper training thread 0] ERROR o.d.parallelism.ParallelWrapper - Uncaught exception: java.lang.RuntimeException: org.nd4j.linalg.exception.ND4JException: CUDA exception happened. Terminating. Last op: [null]
org.nd4j.linalg.exception.ND4JException: CUDA exception happened. Terminating. Last op: [null]
at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:55)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:106)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:47)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:203)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:62)
at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1113)
at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:515)
at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:714)
org.nd4j.linalg.exception.ND4JException: CUDA exception happened. Terminating. Last op: [null]
at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:55)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:106)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:47)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:203)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:62)
at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1113)
at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:515)
at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:714)
org.nd4j.linalg.exception.ND4JException: CUDA exception happened. Terminating. Last op: [null]
at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:55)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:106)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:47)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:203)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:62)
at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1113)
at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:515)
at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:714)
org.nd4j.linalg.exception.ND4JException: CUDA exception happened. Terminating. Last op: [null]
at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:55)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:106)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:47)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:203)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:62)
at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:696)
java.lang.RuntimeException: org.nd4j.linalg.exception.ND4JException: CUDA exception happened. Terminating. Last op: [null]
at org.deeplearning4j.parallelism.trainer.DefaultTrainer.run(DefaultTrainer.java:399)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.nd4j.linalg.exception.ND4JException: CUDA exception happened. Terminating. Last op: [null]
at org.nd4j.jita.allocator.pointers.cuda.cudaStream_t.synchronize(cudaStream_t.java:24)
at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:302)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:470)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:396)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.(BaseCudaDataBuffer.java:216)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.(BaseCudaDataBuffer.java:327)
at org.nd4j.linalg.jcublas.buffer.CudaIntDataBuffer.(CudaIntDataBuffer.java:53)
at org.nd4j.linalg.jcublas.buffer.CudaIntDataBuffer.(CudaIntDataBuffer.java:81)
at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createInt(CudaDataBufferFactory.java:356)
at org.nd4j.linalg.factory.Nd4j.createBufferDetached(Nd4j.java:1430)
at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:2045)
at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:47)
at org.nd4j.jita.constant.ProtectedCudaShapeInfoProvider.createShapeInformation(ProtectedCudaShapeInfoProvider.java:64)
at org.nd4j.linalg.jcublas.CachedShapeInfoProvider.createShapeInformation(CachedShapeInfoProvider.java:26)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:163)
at org.nd4j.linalg.jcublas.JCublasNDArray.(JCublasNDArray.java:335)
at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:257)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4231)
at org.nd4j.linalg.api.ndarray.BaseNDArray.create(BaseNDArray.java:1967)
at org.nd4j.linalg.api.ndarray.BaseNDArray.subArray(BaseNDArray.java:2135)
at org.nd4j.linalg.api.ndarray.BaseNDArray.get(BaseNDArray.java:4216)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doTruncatedBPTT(MultiLayerNetwork.java:1441)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1824)
at org.deeplearning4j.parallelism.trainer.DefaultTrainer.fit(DefaultTrainer.java:209)
at org.deeplearning4j.parallelism.trainer.DefaultTrainer.run(DefaultTrainer.java:335)
... 3 more
Exception in thread "UniGC thread 0" java.lang.RuntimeException: java.lang.NullPointerException
at org.deeplearning4j.datasets.iterator.AsyncDataSetIterator$AsyncPrefetchThread.run(AsyncDataSetIterator.java:442)
Caused by: java.lang.NullPointerException
at org.nd4j.jita.allocator.pointers.CudaPointer.(CudaPointer.java:22)
at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.(cudaEvent_t.java:33)
at org.nd4j.jita.concurrency.EventsProvider.getEvent(EventsProvider.java:34)
at org.nd4j.jita.flow.impl.SynchronousFlowController.registerAction(SynchronousFlowController.java:249)
at org.nd4j.jita.handler.impl.CudaZeroHandler.registerAction(CudaZeroHandler.java:1258)
at org.nd4j.jita.allocator.impl.AtomicAllocator.registerAction(AtomicAllocator.java:1017)
at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.invoke(CudaExecutioner.java:1638)
at org.nd4j.linalg.jcublas.ops.executioner.CudaGridExecutioner.pushToGrid(CudaGridExecutioner.java:225)
at org.nd4j.linalg.jcublas.ops.executioner.CudaGridExecutioner.processAsGridOp(CudaGridExecutioner.java:307)
at org.nd4j.linalg.jcublas.ops.executioner.CudaGridExecutioner.exec(CudaGridExecutioner.java:112)
at org.nd4j.linalg.api.ndarray.BaseNDArray.assign(BaseNDArray.java:1267)
at org.nd4j.linalg.api.shape.Shape.toOffsetZeroCopyHelper(Shape.java:248)
at org.nd4j.linalg.api.shape.Shape.toOffsetZeroCopy(Shape.java:213)
at org.nd4j.linalg.api.ndarray.BaseNDArray.dup(BaseNDArray.java:1714)
at org.nd4j.linalg.jcublas.JCublasNDArray.dup(JCublasNDArray.java:440)
at org.nd4j.linalg.jcublas.JCublasNDArray.migrate(JCublasNDArray.java:689)
at org.nd4j.linalg.dataset.DataSet.migrate(DataSet.java:1339)
at org.deeplearning4j.datasets.iterator.callbacks.InterleavedDataSetCallback.call(InterleavedDataSetCallback.java:66)
at org.deeplearning4j.datasets.iterator.AsyncDataSetIterator$AsyncPrefetchThread.run(AsyncDataSetIterator.java:420)
org.nd4j.linalg.exception.ND4JException: CUDA exception happened. Terminating. Last op: [null]
at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:55)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:106)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:47)
at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:203)
at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:62)
at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:696)
16:37:51.351 [ParallelWrapper training thread 2] DEBUG o.d.p.trainer.DefaultTrainer - Terminating all workspaces for trainer_2
16:37:51.351 [ParallelWrapper training thread 3] DEBUG o.d.p.trainer.DefaultTrainer - Terminating all workspaces for trainer_3
16:38:11.846 [ParallelWrapper training thread 2] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [42] to device [1], out of [4] devices...
16:38:26.784 [ParallelWrapper training thread 3] DEBUG o.n.j.c.CudaAffinityManager - Manually mapping thread [43] to device [2], out of [4] devices...
16:38:26.784 [ParallelWrapper training thread 1] DEBUG o.d.p.trainer.DefaultTrainer - Terminating all workspaces for trainer_1
16:38:26.787 [ParallelWrapper training thread 2] ERROR o.d.parallelism.ParallelWrapper - Uncaught exception: java.lang.RuntimeException: java.lang.RuntimeException: Can't allocate [HOST] memory: 998662; threadId: 37
java.lang.RuntimeException: java.lang.RuntimeException: Can't allocate [HOST] memory: 998662; threadId: 3716:38:26.787 [ParallelWrapper training thread 3] ERROR o.d.parallelism.ParallelWrapper - Uncaught exception: java.lang.RuntimeException: java.lang.RuntimeException: Can't allocate [HOST] memory: 998662; threadId: 39
16:38:34.482 [ParallelWrapper training thread 1] ERROR o.d.parallelism.ParallelWrapper - Uncaught exception: java.lang.RuntimeException: java.lang.RuntimeException: Can't allocate [HOST] memory: 998662; threadId: 35
Caused by: java.lang.RuntimeException: Can't allocate [HOST] memory: 998662; threadId: 37
at org.nd4j.jita.memory.impl.CudaDirectProvider.malloc(CudaDirectProvider.java:59)
at org.nd4j.jita.memory.impl.CudaCachingZeroProvider.malloc(CudaCachingZeroProvider.java:113)
at org.nd4j.jita.memory.impl.CudaFullCachingProvider.malloc(CudaFullCachingProvider.java:91)
at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:237)
at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:258)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:470)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:396)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.(BaseCudaDataBuffer.java:216)
at org.nd4j.linalg.jcublas.buffer.CudaHalfDataBuffer.(CudaHalfDataBuffer.java:60)
at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createHalf(CudaDataBufferFactory.java:511)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1472)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1442)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:247)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:284)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:566)
at org.nd4j.linalg.jcublas.JCublasNDArray.(JCublasNDArray.java:252)
at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:238)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:5014)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4965)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4093)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:598)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:539)
at org.deeplearning4j.parallelism.trainer.DefaultTrainer.run(DefaultTrainer.java:262)
... 3 more
java.lang.RuntimeException: java.lang.RuntimeException: Can't allocate [HOST] memory: 998662; threadId: 35
at org.deeplearning4j.parallelism.trainer.DefaultTrainer.run(DefaultTrainer.java:399)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Can't allocate [HOST] memory: 998662; threadId: 35
at org.nd4j.jita.memory.impl.CudaDirectProvider.malloc(CudaDirectProvider.java:59)
at org.nd4j.jita.memory.impl.CudaCachingZeroProvider.malloc(CudaCachingZeroProvider.java:113)
at org.nd4j.jita.memory.impl.CudaFullCachingProvider.malloc(CudaFullCachingProvider.java:91)
at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:237)
at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:258)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:470)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:396)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.(BaseCudaDataBuffer.java:216)
at org.nd4j.linalg.jcublas.buffer.CudaHalfDataBuffer.(CudaHalfDataBuffer.java:60)
at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createHalf(CudaDataBufferFactory.java:511)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1472)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1442)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:247)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:284)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:566)
at org.nd4j.linalg.jcublas.JCublasNDArray.(JCublasNDArray.java:252)
at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:238)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:5014)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4965)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4093)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:598)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:539)
at org.deeplearning4j.parallelism.trainer.DefaultTrainer.run(DefaultTrainer.java:262)
... 3 more
java.lang.RuntimeException: java.lang.RuntimeException: Can't allocate [HOST] memory: 998662; threadId: 39
at org.deeplearning4j.parallelism.trainer.DefaultTrainer.run(DefaultTrainer.java:399)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Can't allocate [HOST] memory: 998662; threadId: 39
at org.nd4j.jita.memory.impl.CudaDirectProvider.malloc(CudaDirectProvider.java:59)
at org.nd4j.jita.memory.impl.CudaCachingZeroProvider.malloc(CudaCachingZeroProvider.java:113)
at org.nd4j.jita.memory.impl.CudaFullCachingProvider.malloc(CudaFullCachingProvider.java:91)
at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:237)
at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:258)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:470)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:396)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.(BaseCudaDataBuffer.java:216)
at org.nd4j.linalg.jcublas.buffer.CudaHalfDataBuffer.(CudaHalfDataBuffer.java:60)
at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createHalf(CudaDataBufferFactory.java:511)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1472)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1442)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:247)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:284)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:566)
at org.nd4j.linalg.jcublas.JCublasNDArray.(JCublasNDArray.java:252)
at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:238)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:5014)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4965)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4093)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:598)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:539)
at org.deeplearning4j.parallelism.trainer.DefaultTrainer.run(DefaultTrainer.java:262)
... 3 more
The text was updated successfully, but these errors were encountered: