This repository was archived by the owner on Jun 9, 2021. It is now read-only.
-
Couldn't load subscription status.
- Fork 308
This repository was archived by the owner on Jun 9, 2021. It is now read-only.
Incorrect gradients for kernels, tf.nn.conv2d #230
Copy link
Copy link
Open
Description
Problem:
I am getting incorrect gradients for kernels, when using tf.nn.conv2d.
Dimensions are correct, and values seem correct but shuffled.
I'm running on M1.
(Haven't tested this version of tensorflow non-macos.)
Minimal code to reproduce:
import numpy as np
import tensorflow as tf
print(f'{tf.__version__ =}')
def rowmaj(s):
return np.arange(np.product(s)).reshape(s).astype(np.float32)
# Error happens for c>1
n, h, w, c = 1, 3, 3, 2
kh, kw, c, nk = 2, 2, c, 1
A_tf = tf.Variable(rowmaj([n, h, w, c])) # images
B_tf = tf.Variable(rowmaj([kh, kw, c, nk])) # kernels
with tf.GradientTape() as tape:
y_tf = tf.nn.conv2d(A_tf, B_tf, padding='VALID', strides=1)
A_grad, B_grad = tape.gradient(y_tf, [A_tf, B_tf])
# A is fine:
expected_A_grad_flat = np.array([0, 1, 2, 4, 2, 3, 4, 6, 12, 16, 8, 10, 4, 5, 10, 12, 6, 7])
assert np.all(A_grad.numpy().flatten() == expected_A_grad_flat)
# B is not fine:
expected_B_grad_flat = np.array([16, 20, 24, 28, 40, 44, 48, 52])
assert np.all(B_grad.numpy().flatten() == expected_B_grad_flat), \
'Get unexpected gradient value.\n' + \
f'{B_grad.numpy().flatten() = }\n{expected_B_grad_flat = }'Output:
tf.__version__ ='2.4.0-rc0'
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-2-10c7037c287e> in <module>
24 # B is not fine:
25 expected_B_grad_flat = np.array([16, 20, 24, 28, 40, 44, 48, 52])
---> 26 assert np.all(B_grad.numpy().flatten() == expected_B_grad_flat), \
27 'Get unexpected gradient value.\n' + \
28 f'{B_grad.numpy().flatten() = }\n{expected_B_grad_flat = }'
AssertionError: Get unexpected gradient value.
B_grad.numpy().flatten() = array([16., 24., 40., 48., 20., 28., 44., 52.], dtype=float32)
expected_B_grad_flat = array([16, 20, 24, 28, 40, 44, 48, 52])
Code to test arbitrary valid input sizes:
import numpy as np
import tensorflow as tf
import smallpebble as sp
from smallpebble.tests.test_smallpebble import numgrads
print(f'{tf.__version__ =}')
print(f'{sp.__version__ =}')
n, h, w, c = 10, 10, 10, 2
kh, kw, c, nk = 5, 4, c, 1
def rowmaj(s):
return np.arange(np.product(s)).reshape(s).astype(np.float32)
def conv2d(A, B):
return tf.nn.conv2d(A, B, padding='VALID', strides=1).numpy()
A = rowmaj([n, h, w, c]) # images
B = rowmaj([kh, kw, c, nk]) # kernels
A_tf = tf.Variable(A, dtype=tf.float32)
B_tf = tf.Variable(B, dtype=tf.float32)
grad_a, grad_b = numgrads(conv2d, [A, B], n=1, delta=1)
with tf.GradientTape() as tape:
y_tf = tf.nn.conv2d(A_tf, B_tf, padding='VALID', strides=1)
Ag_tf, Bg_tf = tape.gradient(y_tf, [A_tf, B_tf])
print(f'{np.all((Ag_tf - grad_a) == 0) = }')
print(f'{np.all((Bg_tf - grad_b) == 0) = }')
print('expected gradient of B (flat):', grad_b.flatten())
print('tensorflow gradient of B (flat):', Bg_tf.numpy().flatten())Output:
tf.__version__ ='2.4.0-rc0'
sp.__version__ ='2.0.0'
np.all((Ag_tf - grad_a) == 0) = True
np.all((Bg_tf - grad_b) == 0) = False
expected gradient of B (flat): [401520. 401940. 402360. 402780. 403200. 403620. 404040. 404460. 409920.
410340. 410760. 411180. 411600. 412020. 412440. 412860. 418320. 418740.
419160. 419580. 420000. 420420. 420840. 421260. 426720. 427140. 427560.
427980. 428400. 428820. 429240. 429660. 435120. 435540. 435960. 436380.
436800. 437220. 437640. 438060.]
tensorflow gradient of B (flat): [401520. 402360. 403200. 404040. 409920. 410760. 411600. 412440. 418320.
419160. 420000. 420840. 426720. 427560. 428400. 429240. 435120. 435960.
436800. 437640. 401940. 402780. 403620. 404460. 410340. 411180. 412020.
412860. 418740. 419580. 420420. 421260. 427140. 427980. 428820. 429660.
435540. 436380. 437220. 438060.]
Metadata
Metadata
Assignees
Labels
No labels