Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterateInParallel causes GPU crash if collection size is too large #228

Closed
Bam4d opened this issue Apr 20, 2015 · 8 comments
Closed

iterateInParallel causes GPU crash if collection size is too large #228

Bam4d opened this issue Apr 20, 2015 · 8 comments

Comments

@Bam4d
Copy link
Contributor

Bam4d commented Apr 20, 2015

In functions that use the iterateInParallel functionality, such as word2vec, there is no control over how many threads or memory is being used.

iterateInParallel in word2vec creates n word lists (where n is the number of sentences to be trained on)
As an example I have 200000 sentences of length 10 words, iterate in parallel tries to compute this at the same time on the GPU. which in turn either segfaults the program, or throws CUDA_ERROR_OUT_OF_MEMORY.

@Bam4d
Copy link
Contributor Author

Bam4d commented Apr 20, 2015

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f8340a8ad40, pid=31001, tid=140200106391296
#
# JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 1.8.0_25-b17)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libcuda.so.1+0x21bd40]
#```

@agibsonccc
Copy link
Contributor

How much RAM does your GPU have?

Could you output:

nvidia-smi ?

Thanks!

@Bam4d
Copy link
Contributor Author

Bam4d commented Apr 21, 2015

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 C Not Supported |
+-----------------------------------------------------------------------------+
Tue Apr 21 11:50:43 2015
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 580 Off | 0000:04:00.0 N/A | N/A |
| 50% 63C P0 N/A / N/A | 742MiB / 1535MiB | N/A Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 C Not Supported |
+-----------------------------------------------------------------------------+

@Bam4d
Copy link
Contributor Author

Bam4d commented Apr 21, 2015

wrote a bit of code to throttle parallel if CUDA is running out of memory... still segfautls occaisionally. probably an issue with the driver?

https://github.com/import-io/deeplearning4j/commit/a38a5883e512cf2ffb6e36c2b2d6198ee72baf72

@agibsonccc
Copy link
Contributor

Yeah. This is related to the parallel synchronization.

@agibsonccc
Copy link
Contributor

This is what I'm seeing running GPUs on spark:
4:48,414 INFO ~ Training on layer 1 with 10 examples
18:04:48,416 INFO ~ Training on layer 1 with 10 examples
18:04:48,498 INFO ~ Training on layer 1 with 9 examples
18:04:54,169 ERROR ~ Exception in task 4.0 in stage 4.0 (TID 21)
jcuda.CudaException: CUDA_ERROR_OUT_OF_MEMORY
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.checkResult(BaseCudaDataBuffer.java:336)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.getDevicePointer(BaseCudaDataBuffer.java:185)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.getDevicePointer(BaseCudaDataBuffer.java:57)
at org.nd4j.linalg.jcublas.CublasPointer.(CublasPointer.java:61)
at org.nd4j.linalg.jcublas.SimpleJCublas.copy(SimpleJCublas.java:857)
at org.nd4j.linalg.jcublas.JCublasWrapper.copy(JCublasWrapper.java:66)
at org.nd4j.linalg.api.ndarray.BaseNDArray.dup(BaseNDArray.java:1085)
at org.nd4j.linalg.ops.transforms.Transforms.sqrt(Transforms.java:631)
at org.nd4j.linalg.ops.transforms.Transforms.sqrt(Transforms.java:401)
at org.nd4j.linalg.learning.AdaGrad.getGradient(AdaGrad.java:196)
at org.deeplearning4j.optimize.GradientAdjustment.updateGradientAccordingToParams(GradientAdjustment.java:95)
at org.deeplearning4j.optimize.GradientAdjustment.updateGradientAccordingToParams(GradientAdjustment.java:63)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.updateGradientAccordingToParams(BaseOptimizer.java:269)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:120)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.optimize(BaseOptimizer.java:187)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.layers.BaseLayer.fit(BaseLayer.java:368)
at org.deeplearning4j.nn.layers.feedforward.rbm.RBM.fit(RBM.java:397)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.pretrain(MultiLayerNetwork.java:217)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1128)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1156)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:66)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:38)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18:04:54,180 ERROR ~ Task 4 in stage 4.0 failed 1 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 4.0 failed 1 times, most recent failure: Lost task 4.0 in stage 4.0 (TID 21, localhost): jcuda.CudaException: CUDA_ERROR_OUT_OF_MEMORY
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.checkResult(BaseCudaDataBuffer.java:336)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.getDevicePointer(BaseCudaDataBuffer.java:185)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.getDevicePointer(BaseCudaDataBuffer.java:57)
at org.nd4j.linalg.jcublas.CublasPointer.(CublasPointer.java:61)
at org.nd4j.linalg.jcublas.SimpleJCublas.copy(SimpleJCublas.java:857)
at org.nd4j.linalg.jcublas.JCublasWrapper.copy(JCublasWrapper.java:66)
at org.nd4j.linalg.api.ndarray.BaseNDArray.dup(BaseNDArray.java:1085)
at org.nd4j.linalg.ops.transforms.Transforms.sqrt(Transforms.java:631)
at org.nd4j.linalg.ops.transforms.Transforms.sqrt(Transforms.java:401)
at org.nd4j.linalg.learning.AdaGrad.getGradient(AdaGrad.java:196)
at org.deeplearning4j.optimize.GradientAdjustment.updateGradientAccordingToParams(GradientAdjustment.java:95)
at org.deeplearning4j.optimize.GradientAdjustment.updateGradientAccordingToParams(GradientAdjustment.java:63)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.updateGradientAccordingToParams(BaseOptimizer.java:269)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:120)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.optimize(BaseOptimizer.java:187)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.layers.BaseLayer.fit(BaseLayer.java:368)
at org.deeplearning4j.nn.layers.feedforward.rbm.RBM.fit(RBM.java:397)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.pretrain(MultiLayerNetwork.java:217)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1128)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1156)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:66)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:38)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
18:04:54,189 ERROR ~ Exception in task 7.0 in stage 4.0 (TID 24)
java.lang.RuntimeException: Could not execute kernel
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.invoke(JCudaExecutioner.java:302)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.exec(JCudaExecutioner.java:69)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.execAndReturn(JCudaExecutioner.java:106)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.exec(JCudaExecutioner.java:199)
at org.nd4j.linalg.api.ndarray.BaseNDArray.mean(BaseNDArray.java:3077)
at org.deeplearning4j.nn.layers.feedforward.rbm.RBM.gradient(RBM.java:184)
at org.deeplearning4j.nn.layers.BaseLayer.gradientAndScore(BaseLayer.java:374)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:119)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.optimize(BaseOptimizer.java:187)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.layers.BaseLayer.fit(BaseLayer.java:368)
at org.deeplearning4j.nn.layers.feedforward.rbm.RBM.fit(RBM.java:397)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.pretrain(MultiLayerNetwork.java:217)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1128)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1156)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:66)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:38)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: jcuda.CudaException: Could not get function 'mean_strided_float' from module.
Name in module might be mangled. Try adding the line
extern "C"
before the function you want to call, or open the PTX/CUBIN
file with a text editor to find out the mangled function name
at jcuda.utils.KernelLauncher.initFunction(KernelLauncher.java:704)
at jcuda.utils.KernelLauncher.load(KernelLauncher.java:442)
at org.nd4j.linalg.jcublas.kernel.KernelFunctionLoader.get(KernelFunctionLoader.java:98)
at org.nd4j.linalg.jcublas.kernel.KernelFunctionLoader.launcher(KernelFunctionLoader.java:89)
at org.nd4j.linalg.jcublas.kernel.KernelFunctions.invoke(KernelFunctions.java:112)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.invokeFunction(JCudaExecutioner.java:315)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.invoke(JCudaExecutioner.java:300)
... 29 more
Suppressed: jcuda.CudaException: INVALID CUresult: 30
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.checkResult(BaseCudaDataBuffer.java:336)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.freeDevicePointer(BaseCudaDataBuffer.java:345)
at org.nd4j.linalg.jcublas.CublasPointer.close(CublasPointer.java:28)
at org.nd4j.linalg.jcublas.util.KernelParamsWrapper.close(KernelParamsWrapper.java:150)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.invoke(JCudaExecutioner.java:301)
... 29 more
Caused by: jcuda.CudaException: Could not get function 'mean_strided_float' from module.
Name in module might be mangled. Try adding the line
extern "C"
before the function you want to call, or open the PTX/CUBIN
file with a text editor to find out the mangled function name
at jcuda.utils.KernelLauncher.initFunction(KernelLauncher.java:699)
... 35 more
18:04:54,202 ERROR ~ Error occurred while fetching local blocks
java.io.FileNotFoundException: /tmp/spark-e3491d08-b1e5-425e-916f-25717066d806/blockmgr-0219b766-6db7-4020-ab28-a5d592c2d429/30/shuffle_0_0_0.index (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109)
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:304)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:235)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:269)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.(ShuffleBlockFetcherIterator.scala:115)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:76)
at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:40)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:93)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:92)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:58)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:38)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18:04:54,376 ERROR ~ Exception in task 2.0 in stage 4.0 (TID 19)
java.lang.RuntimeException: Could not execute kernel
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.invoke(JCudaExecutioner.java:302)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.exec(JCudaExecutioner.java:69)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.execAndReturn(JCudaExecutioner.java:106)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.exec(JCudaExecutioner.java:199)
at org.nd4j.linalg.api.ndarray.BaseNDArray.mean(BaseNDArray.java:3077)
at org.deeplearning4j.nn.layers.feedforward.rbm.RBM.gradient(RBM.java:184)
at org.deeplearning4j.nn.layers.BaseLayer.gradientAndScore(BaseLayer.java:374)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:119)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.optimize(BaseOptimizer.java:187)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.layers.BaseLayer.fit(BaseLayer.java:368)
at org.deeplearning4j.nn.layers.feedforward.rbm.RBM.fit(RBM.java:397)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.pretrain(MultiLayerNetwork.java:217)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1128)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1156)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:66)
at org.deeplearning4j.spark.impl.multilayer.IterativeReduceFlatMap.call(IterativeReduceFlatMap.java:38)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: jcuda.CudaException: Could not get function 'mean_strided_float' from module.
Name in module might be mangled. Try adding the line
extern "C"
before the function you want to call, or open the PTX/CUBIN
file with a text editor to find out the mangled function name
at jcuda.utils.KernelLauncher.initFunction(KernelLauncher.java:704)
at jcuda.utils.KernelLauncher.load(KernelLauncher.java:442)
at org.nd4j.linalg.jcublas.kernel.KernelFunctionLoader.get(KernelFunctionLoader.java:98)
at org.nd4j.linalg.jcublas.kernel.KernelFunctionLoader.launcher(KernelFunctionLoader.java:89)
at org.nd4j.linalg.jcublas.kernel.KernelFunctions.invoke(KernelFunctions.java:112)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.invokeFunction(JCudaExecutioner.java:315)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.invoke(JCudaExecutioner.java:300)
... 29 more
Suppressed: jcuda.CudaException: INVALID CUresult: 30
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.checkResult(BaseCudaDataBuffer.java:336)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.freeDevicePointer(BaseCudaDataBuffer.java:345)
at org.nd4j.linalg.jcublas.CublasPointer.close(CublasPointer.java:28)
at org.nd4j.linalg.jcublas.util.KernelParamsWrapper.close(KernelParamsWrapper.java:150)
at org.nd4j.linalg.jcublas.ops.executioner.JCudaExecutioner.invoke(JCudaExecutioner.java:301)
... 29 more
Caused by: jcuda.CudaException: Could not get function 'mean_strided_float' from module.
Name in module might be mangled. Try adding the line
extern "C"
before the function you want to call, or open the PTX/CUBIN
file with a text editor to find out the mangled function name
at jcuda.utils.KernelLauncher.initFunction(KernelLauncher.java:699)
... 35 more

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f68711d5234, pid=7856, tid=140081833699072

JRE version: OpenJDK Runtime Environment (8.0_45-b13) (build 1.8.0_45-b13)

Java VM: OpenJDK 64-Bit Server VM (25.45-b02 mixed mode linux-amd64 compressed oops)

Problematic frame:

C [libcublas.so.7.0+0x21234]

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:

/home/buildbot/hs_err_pid7856.log

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

Aborted

@agibsonccc
Copy link
Contributor

We've stabilized this. Closing.

@lock
Copy link

lock bot commented Jan 21, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants