In [6]:
from theano import tensor as T
from theano import function

import numpy as np

### known_grads

`known_grads` is a dictionary that specifies the gradient of the cost wrt to some variables, instead of the ones automatically derived from the cost. For instance, 

In [15]:
x = T.vector('x')
y = x ** 3 + x ** 2
z = 6 * y + x

# normal gradient
g1 = T.grad(z.mean(), x)

# dz/dx = dz/dy * dy/dx + d(x)/dx = (2 * x) * (3 * (x ** 2) + 2 * x)
g2 = T.grad(z.mean(), x, known_grads={y : 2 * x})

f1 = function([x], [y, z, g1])
f2 = function([x], [y, z, g2])

In [14]:
x_val = np.arange(2, dtype='float32')
print f1(x_val)
print f2(x_val)

[array([ 0.,  2.], dtype=float32), array([  0.,  13.], dtype=float32), array([  0.5,  15.5], dtype=float32)]
[array([ 0.,  2.], dtype=float32), array([  0.,  13.], dtype=float32), array([ 0.,  2.], dtype=float32)]


"known_grads" specifies expressions to use for the gradient of the cost 
wrt some variables, instead of the ones automatically derived from the 
cost. 

For instance, "known_grads={a: 2*x}" means that, during the 
backpropagation, we will consider the gradient wrt a, say dC/da to be 
2*x, instead of the expression derived from the provided cost y.sum(), 
or d(y.sum())/da. 

In that particular case, without known_grads, we would have: 
d(y.sum())/da = 6 * ones_like(a) 
d(y.sum())/dx = ones_like(x) + d(y.sum())/da da/dx 
              = 1 + 6 * (3 * x**2 + 2 * x) 

evaluated at [1, 2], it would give [31, 97]. 

Now, if you replace d(y.sum())/da by 2*x, the last formula becomes: 

dC/dx = ones_like(x) + 2*x da/dx 
      = 1 + 2*x * (3 * x**2 + 2 * x) 

If you evaluate that at [1, 2], it gives [11, 65]. 