Incorrect gradients for kernels, tf.nn.conv2d

**Problem:**
I am getting incorrect gradients for kernels, when using tf.nn.conv2d.
Dimensions are correct, and values seem correct *but shuffled*.
I'm running on M1.
(Haven't tested this version of tensorflow non-macos.)

**Minimal code to reproduce:**

```python
import numpy as np
import tensorflow as tf
print(f'{tf.__version__ =}')

def rowmaj(s):
    return np.arange(np.product(s)).reshape(s).astype(np.float32)

# Error happens for c>1
n, h, w, c = 1, 3, 3, 2
kh, kw, c, nk = 2, 2, c, 1

A_tf = tf.Variable(rowmaj([n, h, w, c]))  # images
B_tf = tf.Variable(rowmaj([kh, kw, c, nk]))  # kernels

with tf.GradientTape() as tape:
    y_tf = tf.nn.conv2d(A_tf, B_tf, padding='VALID', strides=1)

A_grad, B_grad = tape.gradient(y_tf, [A_tf, B_tf])

# A is fine:
expected_A_grad_flat = np.array([0, 1, 2, 4, 2, 3, 4, 6, 12, 16, 8, 10, 4, 5, 10, 12, 6, 7])
assert np.all(A_grad.numpy().flatten() == expected_A_grad_flat)

# B is not fine:
expected_B_grad_flat = np.array([16, 20, 24, 28, 40, 44, 48, 52])
assert np.all(B_grad.numpy().flatten() == expected_B_grad_flat), \
    'Get unexpected gradient value.\n' + \
    f'{B_grad.numpy().flatten() = }\n{expected_B_grad_flat = }'
```
Output:
```
tf.__version__ ='2.4.0-rc0'

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-2-10c7037c287e> in <module>
     24 # B is not fine:
     25 expected_B_grad_flat = np.array([16, 20, 24, 28, 40, 44, 48, 52])
---> 26 assert np.all(B_grad.numpy().flatten() == expected_B_grad_flat), \
     27     'Get unexpected gradient value.\n' + \
     28     f'{B_grad.numpy().flatten() = }\n{expected_B_grad_flat = }'

AssertionError: Get unexpected gradient value.
B_grad.numpy().flatten() = array([16., 24., 40., 48., 20., 28., 44., 52.], dtype=float32)
expected_B_grad_flat = array([16, 20, 24, 28, 40, 44, 48, 52])
```

**Code to test arbitrary valid input sizes:**

```python
import numpy as np
import tensorflow as tf
import smallpebble as sp
from smallpebble.tests.test_smallpebble import numgrads
print(f'{tf.__version__ =}')
print(f'{sp.__version__ =}')

n, h, w, c = 10, 10, 10, 2
kh, kw, c, nk = 5, 4, c, 1

def rowmaj(s):
    return np.arange(np.product(s)).reshape(s).astype(np.float32)

def conv2d(A, B):
    return tf.nn.conv2d(A, B, padding='VALID', strides=1).numpy()

A = rowmaj([n, h, w, c])  # images
B = rowmaj([kh, kw, c, nk])  # kernels

A_tf = tf.Variable(A, dtype=tf.float32)
B_tf = tf.Variable(B, dtype=tf.float32)

grad_a, grad_b = numgrads(conv2d, [A, B], n=1, delta=1)

with tf.GradientTape() as tape:
    y_tf = tf.nn.conv2d(A_tf, B_tf, padding='VALID', strides=1)

Ag_tf, Bg_tf = tape.gradient(y_tf, [A_tf, B_tf])

print(f'{np.all((Ag_tf - grad_a) == 0) = }')
print(f'{np.all((Bg_tf - grad_b) == 0) = }')

print('expected gradient of B (flat):', grad_b.flatten())
print('tensorflow gradient of B (flat):', Bg_tf.numpy().flatten())
```
Output:
```
tf.__version__ ='2.4.0-rc0'
sp.__version__ ='2.0.0'
np.all((Ag_tf - grad_a) == 0) = True
np.all((Bg_tf - grad_b) == 0) = False
expected gradient of B (flat): [401520. 401940. 402360. 402780. 403200. 403620. 404040. 404460. 409920.
 410340. 410760. 411180. 411600. 412020. 412440. 412860. 418320. 418740.
 419160. 419580. 420000. 420420. 420840. 421260. 426720. 427140. 427560.
 427980. 428400. 428820. 429240. 429660. 435120. 435540. 435960. 436380.
 436800. 437220. 437640. 438060.]
tensorflow gradient of B (flat): [401520. 402360. 403200. 404040. 409920. 410760. 411600. 412440. 418320.
 419160. 420000. 420840. 426720. 427560. 428400. 429240. 435120. 435960.
 436800. 437640. 401940. 402780. 403620. 404460. 410340. 411180. 412020.
 412860. 418740. 419580. 420420. 421260. 427140. 427980. 428820. 429660.
 435540. 436380. 437220. 438060.]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Incorrect gradients for kernels, tf.nn.conv2d #230

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Incorrect gradients for kernels, tf.nn.conv2d #230

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions