R_op is ran on zero_grad nodes. #5792

botev · 2017-03-31T01:18:13Z

So, I keep getting this annoying error:

NotImplementedError: Prod{axis=None, dtype='int64', acc_dtype='int64'} of class Prod does not implement R_op. If this is a theano op, write to the theano-dev mailing list for assistance. If it is your own op, implement the R_op method.

Because in the calculation of the MRng gaussian sampling there is Prod operator. I've tried to wrap the sampled random variables into zero_grad, but Theano still tries to apply the R_op to them. Currently the code looks like this:

        x, x_out, _ = self.last_calc
        theano.printing.debugprint(x)
        Jv = T.Rop(x, params, v)

The graph output is:

dot [id A] ''   
 |Join [id B] ''   
 | |TensorConstant{1} [id C]
 | |Reshape{2} [id D] ''   
 | | |Elemwise{add,no_inplace} [id E] ''   
 | | | |InplaceDimShuffle{0,x,1} [id F] ''   
 | | | | |Subtensor{::, :int64:} [id G] ''   
 | | | |   |Join [id H] ''   
 | | | |   | |TensorConstant{1} [id C]
 | | | |   | |Subtensor{::, :int64:} [id I] ''   
 | | | |   | | |dot [id J] ''   
 | | | |   | | | |Join [id K] ''   
 | | | |   | | | | |TensorConstant{1} [id C]
 | | | |   | | | | |x.input [id L]
 | | | |   | | | | |Alloc [id M] ''   
 | | | |   | | | |   |TensorConstant{1.0} [id N]
 | | | |   | | | |   |Subtensor{int64} [id O] ''   
 | | | |   | | | |   | |Shape [id P] ''   
 | | | |   | | | |   | | |x.input [id L]
 | | | |   | | | |   | |Constant{0} [id Q]
 | | | |   | | | |   |TensorConstant{1} [id C]
 | | | |   | | | |encode_1::W [id R]
 | | | |   | | |ScalarFromTensor [id S] ''   
 | | | |   | |   |Elemwise{int_div,no_inplace} [id T] ''   
 | | | |   | |     |Subtensor{int64} [id U] ''   
 | | | |   | |     | |Shape [id V] ''   
 | | | |   | |     | | |dot [id J] ''   
 | | | |   | |     | |Constant{1} [id W]
 | | | |   | |     |TensorConstant{2} [id X]
 | | | |   | |Elemwise{add,no_inplace} [id Y] ''   
 | | | |   |   |softplus [id Z] ''   
 | | | |   |   | |Subtensor{::, int64::} [id BA] ''   
 | | | |   |   |   |dot [id J] ''   
 | | | |   |   |   |ScalarFromTensor [id BB] ''   
 | | | |   |   |     |Elemwise{int_div,no_inplace} [id T] ''   
 | | | |   |   |InplaceDimShuffle{x,x} [id BC] ''   
 | | | |   |     |TensorConstant{1e-06} [id BD]
 | | | |   |ScalarFromTensor [id BE] ''   
 | | | |     |Elemwise{int_div,no_inplace} [id BF] ''   
 | | | |       |Subtensor{int64} [id BG] ''   
 | | | |       | |Shape [id BH] ''   
 | | | |       | | |Join [id H] ''   
 | | | |       | |Constant{1} [id BI]
 | | | |       |TensorConstant{2} [id X]
 | | | |Elemwise{mul,no_inplace} [id BJ] ''   
 | | |   |InplaceDimShuffle{0,x,1} [id BK] ''   
 | | |   | |Subtensor{::, int64::} [id BL] ''   
 | | |   |   |Join [id H] ''   
 | | |   |   |ScalarFromTensor [id BM] ''   
 | | |   |     |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |   |ZeroGrad [id BN] ''   
 | | |     |Elemwise{add,no_inplace} [id BO] ''   
 | | |       |InplaceDimShuffle{x,x,x} [id BP] ''   
 | | |       | |Elemwise{Cast{float64}} [id BQ] ''   
 | | |       |   |TensorConstant{0.0} [id BR]
 | | |       |Elemwise{mul,no_inplace} [id BS] ''   
 | | |         |InplaceDimShuffle{x,x,x} [id BT] ''   
 | | |         | |Elemwise{Cast{float64}} [id BU] ''   
 | | |         |   |TensorConstant{1.0} [id BV]
 | | |         |Reshape{3} [id BW] ''   
 | | |           |Subtensor{:int64:} [id BX] ''   
 | | |           | |Join [id BY] ''   
 | | |           | | |TensorConstant{0} [id BZ]
 | | |           | | |Elemwise{mul,no_inplace} [id CA] ''   
 | | |           | | | |Elemwise{sqrt,no_inplace} [id CB] ''   
 | | |           | | | | |Elemwise{mul,no_inplace} [id CC] ''   
 | | |           | | | |   |InplaceDimShuffle{x} [id CD] ''   
 | | |           | | | |   | |TensorConstant{-2.0} [id CE]
 | | |           | | | |   |Elemwise{log,no_inplace} [id CF] ''   
 | | |           | | | |     |Subtensor{:int64:} [id CG] ''   
 | | |           | | | |       |Elemwise{add,no_inplace} [id CH] ''   
 | | |           | | | |       | |Elemwise{mul,no_inplace} [id CI] ''   
 | | |           | | | |       | | |mrg_uniform{TensorType(float64, vector),no_inplace}.1 [id CJ] ''   
 | | |           | | | |       | | | |<TensorType(int32, matrix)> [id CK]
 | | |           | | | |       | | | |MakeVector{dtype='int64'} [id CL] ''   
 | | |           | | | |       | | |   |Elemwise{add,no_inplace} [id CM] ''   
 | | |           | | | |       | | |     |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id CN] ''   
 | | |           | | | |       | | |     | |MakeVector{dtype='int64'} [id CO] ''   
 | | |           | | | |       | | |     |   |Subtensor{int64} [id CP] ''   
 | | |           | | | |       | | |     |   | |Shape [id CQ] ''   
 | | |           | | | |       | | |     |   | | |Subtensor{::, :int64:} [id G] ''   
 | | |           | | | |       | | |     |   | |Constant{0} [id CR]
 | | |           | | | |       | | |     |   |Elemwise{Cast{int64}} [id CS] ''   
 | | |           | | | |       | | |     |   | |TensorConstant{7} [id CT]
 | | |           | | | |       | | |     |   |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |           | | | |       | | |     |Elemwise{mod,no_inplace} [id CU] ''   
 | | |           | | | |       | | |       |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id CV] ''   
 | | |           | | | |       | | |       | |MakeVector{dtype='int64'} [id CW] ''   
 | | |           | | | |       | | |       |   |Subtensor{int64} [id CP] ''   
 | | |           | | | |       | | |       |   |Elemwise{Cast{int64}} [id CX] ''   
 | | |           | | | |       | | |       |   | |TensorConstant{7} [id CT]
 | | |           | | | |       | | |       |   |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |           | | | |       | | |       |TensorConstant{2} [id X]
 | | |           | | | |       | | |InplaceDimShuffle{x} [id CY] ''   
 | | |           | | | |       | |   |Elemwise{sub,no_inplace} [id CZ] ''   
 | | |           | | | |       | |     |Elemwise{Cast{float64}} [id DA] ''   
 | | |           | | | |       | |     | |TensorConstant{1.0} [id BV]
 | | |           | | | |       | |     |Elemwise{Cast{float64}} [id DB] ''   
 | | |           | | | |       | |       |TensorConstant{0.0} [id BR]
 | | |           | | | |       | |InplaceDimShuffle{x} [id DC] ''   
 | | |           | | | |       |   |Elemwise{Cast{float64}} [id DB] ''   
 | | |           | | | |       |ScalarFromTensor [id DD] ''   
 | | |           | | | |         |Elemwise{int_div,no_inplace} [id DE] ''   
 | | |           | | | |           |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id DF] ''   
 | | |           | | | |           | |Shape [id DG] ''   
 | | |           | | | |           |   |Elemwise{add,no_inplace} [id CH] ''   
 | | |           | | | |           |TensorConstant{2} [id X]
 | | |           | | | |Elemwise{cos,no_inplace} [id DH] ''   
 | | |           | | |   |Elemwise{mul,no_inplace} [id DI] ''   
 | | |           | | |     |InplaceDimShuffle{x} [id DJ] ''   
 | | |           | | |     | |TensorConstant{6.283185307179586} [id DK]
 | | |           | | |     |Subtensor{int64::} [id DL] ''   
 | | |           | | |       |Elemwise{add,no_inplace} [id CH] ''   
 | | |           | | |       |ScalarFromTensor [id DM] ''   
 | | |           | | |         |Elemwise{int_div,no_inplace} [id DN] ''   
 | | |           | | |           |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id DO] ''   
 | | |           | | |           | |Shape [id DP] ''   
 | | |           | | |           |   |Elemwise{add,no_inplace} [id CH] ''   
 | | |           | | |           |TensorConstant{2} [id X]
 | | |           | | |Elemwise{mul,no_inplace} [id DQ] ''   
 | | |           | |   |Elemwise{sqrt,no_inplace} [id CB] ''   
 | | |           | |   |Elemwise{sin,no_inplace} [id DR] ''   
 | | |           | |     |Elemwise{mul,no_inplace} [id DS] ''   
 | | |           | |       |InplaceDimShuffle{x} [id DT] ''   
 | | |           | |       | |TensorConstant{6.283185307179586} [id DK]
 | | |           | |       |Subtensor{int64::} [id DL] ''   
 | | |           | |ScalarFromTensor [id DU] ''   
 | | |           |   |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id DV] ''   
 | | |           |     |MakeVector{dtype='int64'} [id DW] ''   
 | | |           |       |Subtensor{int64} [id CP] ''   
 | | |           |       |Elemwise{Cast{int64}} [id DX] ''   
 | | |           |       | |TensorConstant{7} [id CT]
 | | |           |       |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |           |MakeVector{dtype='int64'} [id DY] ''   
 | | |             |Subtensor{int64} [id CP] ''   
 | | |             |Elemwise{Cast{int64}} [id DZ] ''   
 | | |             | |TensorConstant{7} [id CT]
 | | |             |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |MakeVector{dtype='int64'} [id EA] ''   
 | |   |Elemwise{mul,no_inplace} [id EB] ''   
 | |   | |Subtensor{int64} [id EC] ''   
 | |   | | |Shape [id ED] ''   
 | |   | | | |Join [id H] ''   
 | |   | | |Constant{0} [id EE]
 | |   | |TensorConstant{7} [id CT]
 | |   |Elemwise{int_div,no_inplace} [id BF] ''   
 | |Alloc [id EF] ''   
 |   |TensorConstant{1.0} [id N]
 |   |Subtensor{int64} [id EG] ''   
 |   | |Shape [id EH] ''   
 |   | | |Reshape{2} [id D] ''   
 |   | |Constant{0} [id EI]
 |   |TensorConstant{1} [id C]
 |p::W [id EJ]

As you can see all of the Prod operators are under the mrg_uniform which is in terms part of the mrg_normal and are all together under ZeroGrad. My question is how can I actually avoid this and tell Theano not to do

The text was updated successfully, but these errors were encountered:

lamblin · 2017-03-31T16:46:40Z

Right, zero_grad should also zero out the R_op, but does not.

You can use disconnected_grad instead, which should work.

botev · 2017-04-03T16:23:44Z

@lamblin I've tried disconnected_grad(get_rng().normal(mu.shape)) still gives the same Rop error which means it does not work.

lamblin · 2017-04-03T22:46:27Z

Indeed, ROp seems to ignore disconnected patterns, and still calls Prod.R_Op, which I did not expect.
I guess you could implement locally Prod.R_Op to only accept None, and return None in that case.

botev · 2017-05-24T02:15:24Z

@lamblin

I've implemented successfully the Rop for Prod and ZeroGrad. Could you point me out to where should I stick the tests? Also, currently the Prod Rop gives me the following warnings:

/home/alex/work/python/Theano/theano/tensor/basic.py:5130: UserWarning: flatten outdim parameter is deprecated, use ndim instead.
  "flatten outdim parameter is deprecated, use ndim instead.")
/home/alex/work/python/Theano/theano/tensor/basic.py:5130: UserWarning: flatten outdim parameter is deprecated, use ndim instead.
  "flatten outdim parameter is deprecated, use ndim instead.")

However, in my code I'm not using anywhere flatten so this is coming from somewhere else, but it is not clear where (I've reused a lot of the code from the Lop of the Operator).

botev · 2017-05-24T06:32:35Z

There seems to some problem with sampling however. It is in fact not coming from the Rop implementation by itself or the zero_grad, but something more subtle.

The problem is that because of the Rop traversing everything and the Prod for the sampling being int64 somewhere you create a float64 and thus if we have the warn_float64 = raise it creates an error. An illustrative example of how it fails after I've implemented the Rop is:

def th_normal(shape, mean=0, std=1, dtype=None):
    dtype = dtype or theano.config.floatX
    srng = RandomStreams(np.random.randint(1, 2147462579))
    samples = srng.normal(shape, dtype=dtype)
    samples = zero_grad(samples)
    if std != 1:
        samples *= std
    if mean != 0:
        samples += mean
    return samples


def float64_error():
    a = T.fmatrix()
    b = T.fmatrix()
    shape = (a.shape[0], 30, a.shape[1])
    epsilon = th_normal(shape)
    # epsilon = T.zeros(shape)
    out = a.dimshuffle(0, 'x', 1) * epsilon + a.dimshuffle(0, 'x', 1)
    rop = T.Rop(out, a, b)
    f = theano.function([a, b], rop)
    a_in = np.random.randn(5, 6).astype(theano.config.floatX)
    b_in = np.random.randn(5, 6).astype(theano.config.floatX)
    print(f(a_in, b_in))

Running this will raise an error. Commenting out the line with epsilon = T.zeros(shape) will not raise an error. The error itself is:

Traceback (most recent call last):
  File "/home/alex/work/python/Theano/bin/rop_prod.py", line 97, in <module>
    float64_error()
  File "/home/alex/work/python/Theano/bin/rop_prod.py", line 46, in float64_error
    rop = T.Rop(out, a, b)
  File "/home/alex/work/python/Theano/theano/gradient.py", line 293, in Rop
    _traverse(out.owner)
  File "/home/alex/work/python/Theano/theano/gradient.py", line 256, in _traverse
    _traverse(inp.owner)
  File "/home/alex/work/python/Theano/theano/gradient.py", line 256, in _traverse
    _traverse(inp.owner)
  File "/home/alex/work/python/Theano/theano/gradient.py", line 256, in _traverse
    _traverse(inp.owner)
  [Previous line repeated 13 more times]
  File "/home/alex/work/python/Theano/theano/gradient.py", line 288, in _traverse
    seen_nodes[node] = op.R_op(node.inputs, same_type_eval_points)
  File "/home/alex/work/python/Theano/theano/tensor/elemwise.py", line 605, in R_op
    rop_out = bgrads[jdx] * eval_point
  File "/home/alex/work/python/Theano/theano/tensor/var.py", line 155, in __mul__
    return theano.tensor.mul(self, other)
  File "/home/alex/work/python/Theano/theano/gof/op.py", line 615, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/alex/work/python/Theano/theano/tensor/elemwise.py", line 565, in make_node
    out_broadcastables)]
  File "/home/alex/work/python/Theano/theano/tensor/elemwise.py", line 564, in <listcomp>
    for dtype, broadcastable in izip(out_dtypes,
  File "/home/alex/work/python/Theano/theano/gof/type.py", line 420, in __call__
    return utils.add_tag_trace(self.make_variable(name))
  File "/home/alex/work/python/Theano/theano/tensor/type.py", line 352, in make_variable
    return self.Variable(self, name=name)
  File "/home/alex/work/python/Theano/theano/tensor/var.py", line 821, in __init__
    raise Exception(msg)
Exception: You are creating a TensorVariable with float64 dtype. You requested an action via the Theano flag warn_float64={ignore,warn,raise,pdb}.

Any suggestions?

This was referenced Mar 31, 2017

Redefine the R_Op of ZeroGrad #5797

Open

Implement Prod.R_Op #5798

Open

botev linked a pull request May 30, 2017 that will close this issue

Implemented Rop for Prod and ZeroGrad and added testing for generic Rop #5990

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R_op is ran on zero_grad nodes. #5792

R_op is ran on zero_grad nodes. #5792

botev commented Mar 31, 2017 •

edited

lamblin commented Mar 31, 2017

botev commented Apr 3, 2017

lamblin commented Apr 3, 2017

botev commented May 24, 2017

botev commented May 24, 2017 •

edited

R_op is ran on zero_grad nodes. #5792

R_op is ran on zero_grad nodes. #5792

Comments

botev commented Mar 31, 2017 • edited

lamblin commented Mar 31, 2017

botev commented Apr 3, 2017

lamblin commented Apr 3, 2017

botev commented May 24, 2017

botev commented May 24, 2017 • edited

botev commented Mar 31, 2017 •

edited

botev commented May 24, 2017 •

edited