DL4J: LRN doesn't match MKL-DNN implementation, libnd4j implementation incorrect #7272

AlexDBlack · 2019-03-09T06:44:31Z

When implementing MKL-DNN support for DL4J, I found that MKL-DNN implementation doesn't provide the same result as DL4J for backprop - forward pass is OK.
After digging further, I'm unable to rule out the possibility that both the DL4J LRN backprop implementation and the libnd4j LRN backprop implementations are incorrect here.
DL4J: For example, setting alpha=1, k=0, beta=1 (about the simplest possible case for LRN) results in gradient checks failing when MKL-DNN is not used. Smaller values of alpha (<=1e-4) pass gradient checks. I suspect with a small enough value for alpha, the incorrect component of the gradient calculation just doesn't cause enough of a problem to trigger a test failure.

When MKL-DNN is enabled, I'm not seeing any evidence it is used when set to double precision (as required for gradient checks) - only for float precision. i.e., no output when MKLDNN_VERBOSE=1 env variable is set. Which suggests the libnd4j helper implementation is being used instead, and this is also incorrect - as gradient checks fail when it is used (including for cases that DL4J passes for).

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf Section 3.3

Aha! Link: https://skymindai.aha.io/features/DL4J-5

raver119 · 2019-03-09T06:45:49Z

libnd4j LRN BP being incorrect is known issue, and is a wip.

AlexDBlack · 2019-03-09T06:50:20Z

Right, I'm aware, I need to derive math for that. DL4J also being wrong is main new thing here.

* Fix issues with shared context for MKLDNNConvHelper * Test tweaks * MKL-DNN subsampling layer helper implementation (not yet passing) * First validation test + validation utils * Fixes * Conv2d MKL-DNN helper * Batch norm helper + test (fwd only so far, but passing) * MKLDNN LRN helper * Fix MKL-DNN conv helper * Javadoc; disable LRN until fixed * Properly check for conv/sub helper supported * MKLDNNConv OpContext * OpContext changes for batch norm, subsampling * LRN op context * Disable LRN helper for now * Disable LRN test for now - issue #7272 * Batch norm fixes * Various final (hopefully) fixes - MKL-DNN * Last fix

AlexDBlack added Bug Bugs and problems DL4J General DeepLearning4j issues labels Mar 9, 2019

AlexDBlack self-assigned this Mar 9, 2019

raver119 added the C++ label Mar 9, 2019

AlexDBlack added a commit that referenced this issue Mar 9, 2019

Disable LRN test for now - issue #7272

5fe816f

AlexDBlack mentioned this issue Mar 9, 2019

[WIP] DL4J MKL-DNN Support #7151

Merged

raver119 removed the C++ label Nov 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DL4J: LRN doesn't match MKL-DNN implementation, libnd4j implementation incorrect #7272

DL4J: LRN doesn't match MKL-DNN implementation, libnd4j implementation incorrect #7272

AlexDBlack commented Mar 9, 2019 •

edited by SkymindBot

Loading

raver119 commented Mar 9, 2019

AlexDBlack commented Mar 9, 2019

DL4J: LRN doesn't match MKL-DNN implementation, libnd4j implementation incorrect #7272

DL4J: LRN doesn't match MKL-DNN implementation, libnd4j implementation incorrect #7272

Comments

AlexDBlack commented Mar 9, 2019 • edited by SkymindBot Loading

raver119 commented Mar 9, 2019

AlexDBlack commented Mar 9, 2019

AlexDBlack commented Mar 9, 2019 •

edited by SkymindBot

Loading