Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL4J CuDNN: don't fallback if exception is memory error #6691

Closed
AlexDBlack opened this issue Nov 9, 2018 · 1 comment · Fixed by #6729

Comments

@AlexDBlack
Copy link
Contributor

commented Nov 9, 2018

CuDNN implementations fall back on built-in implementation if an exception is encountered during CuDNN execution.

This makes sense, except for when it's a memory issue, at which point we should simply propagate the exception immediately.

2018-11-09 11:45:27 WARN  BatchNormalization:370 - CuDNN BatchNormalization forward pass execution failed - falling back on built-in implementation
java.lang.RuntimeException: Failed to allocate 411041792 bytes from DEVICE [0] memory
	at org.nd4j.jita.memory.CudaMemoryManager.allocate(CudaMemoryManager.java:76)
	at org.nd4j.jita.workspace.CudaWorkspace.alloc(CudaWorkspace.java:213)
	at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:471)
	at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:416)
	at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.<init>(BaseCudaDataBuffer.java:255)
	at org.nd4j.linalg.jcublas.buffer.CudaFloatDataBuffer.<init>(CudaFloatDataBuffer.java:61)
	at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createFloat(CudaDataBufferFactory.java:331)
	at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1500)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:285)
	at org.nd4j.linalg.jcublas.JCublasNDArray.<init>(JCublasNDArray.java:120)
	at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.createUninitialized(JCublasNDArrayFactory.java:165)
	at org.nd4j.linalg.factory.Nd4j.createUninitialized(Nd4j.java:4442)
	at org.nd4j.linalg.workspace.BaseWorkspaceMgr.createUninitialized(BaseWorkspaceMgr.java:288)
	at org.deeplearning4j.nn.layers.normalization.CudnnBatchNormalizationHelper.preOutput(CudnnBatchNormalizationHelper.java:249)
	at org.deeplearning4j.nn.layers.normalization.BatchNormalization.preOutput(BatchNormalization.java:365)
	at org.deeplearning4j.nn.layers.normalization.BatchNormalization.activate(BatchNormalization.java:317)
AlexDBlack added a commit that referenced this issue Nov 21, 2018
Various DL4J/ND4J fixes (#6729)
* #6728 InvertMatrix edge case fix

* #6671 Updaters - use isRowVectorOrScalar when appropriate

* #6717 Report (optionally) GC info in PerformanceListener

* #6714 Don't generate instance ID for ops when using INDArray constructors

* #6712 Fix FileDocumentIterator handling of empty documents

* #6686 ArchiveUtils listing, single file extraction for zip

* SameDiff: don't make output variables placeholders if shape is unknown

* #6674 SameDiff FlatBuffers: persist placeholder variables

* #6674 SameDiff FlatBuffers: persist placeholder variables

* #6691 DL4J CuDNN: don't fall back to built-in if exception is OOM

* Add graph.fbs changes (forgotten in previous commit)

* LRN builder constructor fix

* Fix test typo

* Dependency management for nd4j-kryo jackson versions to (try to) avoid CI dependency issue
@lock

This comment has been minimized.

Copy link

commented Dec 21, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Dec 21, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
1 participant
You can’t perform that action at this time.