Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R_op is ran on zero_grad nodes. #5792

Open
botev opened this issue Mar 31, 2017 · 5 comments · May be fixed by #5990
Open

R_op is ran on zero_grad nodes. #5792

botev opened this issue Mar 31, 2017 · 5 comments · May be fixed by #5990

Comments

@botev
Copy link
Contributor

botev commented Mar 31, 2017

So, I keep getting this annoying error:

NotImplementedError: Prod{axis=None, dtype='int64', acc_dtype='int64'} of class Prod does not implement R_op. If this is a theano op, write to the theano-dev mailing list for assistance. If it is your own op, implement the R_op method.

Because in the calculation of the MRng gaussian sampling there is Prod operator. I've tried to wrap the sampled random variables into zero_grad, but Theano still tries to apply the R_op to them. Currently the code looks like this:

        x, x_out, _ = self.last_calc
        theano.printing.debugprint(x)
        Jv = T.Rop(x, params, v)

The graph output is:

dot [id A] ''   
 |Join [id B] ''   
 | |TensorConstant{1} [id C]
 | |Reshape{2} [id D] ''   
 | | |Elemwise{add,no_inplace} [id E] ''   
 | | | |InplaceDimShuffle{0,x,1} [id F] ''   
 | | | | |Subtensor{::, :int64:} [id G] ''   
 | | | |   |Join [id H] ''   
 | | | |   | |TensorConstant{1} [id C]
 | | | |   | |Subtensor{::, :int64:} [id I] ''   
 | | | |   | | |dot [id J] ''   
 | | | |   | | | |Join [id K] ''   
 | | | |   | | | | |TensorConstant{1} [id C]
 | | | |   | | | | |x.input [id L]
 | | | |   | | | | |Alloc [id M] ''   
 | | | |   | | | |   |TensorConstant{1.0} [id N]
 | | | |   | | | |   |Subtensor{int64} [id O] ''   
 | | | |   | | | |   | |Shape [id P] ''   
 | | | |   | | | |   | | |x.input [id L]
 | | | |   | | | |   | |Constant{0} [id Q]
 | | | |   | | | |   |TensorConstant{1} [id C]
 | | | |   | | | |encode_1::W [id R]
 | | | |   | | |ScalarFromTensor [id S] ''   
 | | | |   | |   |Elemwise{int_div,no_inplace} [id T] ''   
 | | | |   | |     |Subtensor{int64} [id U] ''   
 | | | |   | |     | |Shape [id V] ''   
 | | | |   | |     | | |dot [id J] ''   
 | | | |   | |     | |Constant{1} [id W]
 | | | |   | |     |TensorConstant{2} [id X]
 | | | |   | |Elemwise{add,no_inplace} [id Y] ''   
 | | | |   |   |softplus [id Z] ''   
 | | | |   |   | |Subtensor{::, int64::} [id BA] ''   
 | | | |   |   |   |dot [id J] ''   
 | | | |   |   |   |ScalarFromTensor [id BB] ''   
 | | | |   |   |     |Elemwise{int_div,no_inplace} [id T] ''   
 | | | |   |   |InplaceDimShuffle{x,x} [id BC] ''   
 | | | |   |     |TensorConstant{1e-06} [id BD]
 | | | |   |ScalarFromTensor [id BE] ''   
 | | | |     |Elemwise{int_div,no_inplace} [id BF] ''   
 | | | |       |Subtensor{int64} [id BG] ''   
 | | | |       | |Shape [id BH] ''   
 | | | |       | | |Join [id H] ''   
 | | | |       | |Constant{1} [id BI]
 | | | |       |TensorConstant{2} [id X]
 | | | |Elemwise{mul,no_inplace} [id BJ] ''   
 | | |   |InplaceDimShuffle{0,x,1} [id BK] ''   
 | | |   | |Subtensor{::, int64::} [id BL] ''   
 | | |   |   |Join [id H] ''   
 | | |   |   |ScalarFromTensor [id BM] ''   
 | | |   |     |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |   |ZeroGrad [id BN] ''   
 | | |     |Elemwise{add,no_inplace} [id BO] ''   
 | | |       |InplaceDimShuffle{x,x,x} [id BP] ''   
 | | |       | |Elemwise{Cast{float64}} [id BQ] ''   
 | | |       |   |TensorConstant{0.0} [id BR]
 | | |       |Elemwise{mul,no_inplace} [id BS] ''   
 | | |         |InplaceDimShuffle{x,x,x} [id BT] ''   
 | | |         | |Elemwise{Cast{float64}} [id BU] ''   
 | | |         |   |TensorConstant{1.0} [id BV]
 | | |         |Reshape{3} [id BW] ''   
 | | |           |Subtensor{:int64:} [id BX] ''   
 | | |           | |Join [id BY] ''   
 | | |           | | |TensorConstant{0} [id BZ]
 | | |           | | |Elemwise{mul,no_inplace} [id CA] ''   
 | | |           | | | |Elemwise{sqrt,no_inplace} [id CB] ''   
 | | |           | | | | |Elemwise{mul,no_inplace} [id CC] ''   
 | | |           | | | |   |InplaceDimShuffle{x} [id CD] ''   
 | | |           | | | |   | |TensorConstant{-2.0} [id CE]
 | | |           | | | |   |Elemwise{log,no_inplace} [id CF] ''   
 | | |           | | | |     |Subtensor{:int64:} [id CG] ''   
 | | |           | | | |       |Elemwise{add,no_inplace} [id CH] ''   
 | | |           | | | |       | |Elemwise{mul,no_inplace} [id CI] ''   
 | | |           | | | |       | | |mrg_uniform{TensorType(float64, vector),no_inplace}.1 [id CJ] ''   
 | | |           | | | |       | | | |<TensorType(int32, matrix)> [id CK]
 | | |           | | | |       | | | |MakeVector{dtype='int64'} [id CL] ''   
 | | |           | | | |       | | |   |Elemwise{add,no_inplace} [id CM] ''   
 | | |           | | | |       | | |     |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id CN] ''   
 | | |           | | | |       | | |     | |MakeVector{dtype='int64'} [id CO] ''   
 | | |           | | | |       | | |     |   |Subtensor{int64} [id CP] ''   
 | | |           | | | |       | | |     |   | |Shape [id CQ] ''   
 | | |           | | | |       | | |     |   | | |Subtensor{::, :int64:} [id G] ''   
 | | |           | | | |       | | |     |   | |Constant{0} [id CR]
 | | |           | | | |       | | |     |   |Elemwise{Cast{int64}} [id CS] ''   
 | | |           | | | |       | | |     |   | |TensorConstant{7} [id CT]
 | | |           | | | |       | | |     |   |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |           | | | |       | | |     |Elemwise{mod,no_inplace} [id CU] ''   
 | | |           | | | |       | | |       |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id CV] ''   
 | | |           | | | |       | | |       | |MakeVector{dtype='int64'} [id CW] ''   
 | | |           | | | |       | | |       |   |Subtensor{int64} [id CP] ''   
 | | |           | | | |       | | |       |   |Elemwise{Cast{int64}} [id CX] ''   
 | | |           | | | |       | | |       |   | |TensorConstant{7} [id CT]
 | | |           | | | |       | | |       |   |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |           | | | |       | | |       |TensorConstant{2} [id X]
 | | |           | | | |       | | |InplaceDimShuffle{x} [id CY] ''   
 | | |           | | | |       | |   |Elemwise{sub,no_inplace} [id CZ] ''   
 | | |           | | | |       | |     |Elemwise{Cast{float64}} [id DA] ''   
 | | |           | | | |       | |     | |TensorConstant{1.0} [id BV]
 | | |           | | | |       | |     |Elemwise{Cast{float64}} [id DB] ''   
 | | |           | | | |       | |       |TensorConstant{0.0} [id BR]
 | | |           | | | |       | |InplaceDimShuffle{x} [id DC] ''   
 | | |           | | | |       |   |Elemwise{Cast{float64}} [id DB] ''   
 | | |           | | | |       |ScalarFromTensor [id DD] ''   
 | | |           | | | |         |Elemwise{int_div,no_inplace} [id DE] ''   
 | | |           | | | |           |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id DF] ''   
 | | |           | | | |           | |Shape [id DG] ''   
 | | |           | | | |           |   |Elemwise{add,no_inplace} [id CH] ''   
 | | |           | | | |           |TensorConstant{2} [id X]
 | | |           | | | |Elemwise{cos,no_inplace} [id DH] ''   
 | | |           | | |   |Elemwise{mul,no_inplace} [id DI] ''   
 | | |           | | |     |InplaceDimShuffle{x} [id DJ] ''   
 | | |           | | |     | |TensorConstant{6.283185307179586} [id DK]
 | | |           | | |     |Subtensor{int64::} [id DL] ''   
 | | |           | | |       |Elemwise{add,no_inplace} [id CH] ''   
 | | |           | | |       |ScalarFromTensor [id DM] ''   
 | | |           | | |         |Elemwise{int_div,no_inplace} [id DN] ''   
 | | |           | | |           |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id DO] ''   
 | | |           | | |           | |Shape [id DP] ''   
 | | |           | | |           |   |Elemwise{add,no_inplace} [id CH] ''   
 | | |           | | |           |TensorConstant{2} [id X]
 | | |           | | |Elemwise{mul,no_inplace} [id DQ] ''   
 | | |           | |   |Elemwise{sqrt,no_inplace} [id CB] ''   
 | | |           | |   |Elemwise{sin,no_inplace} [id DR] ''   
 | | |           | |     |Elemwise{mul,no_inplace} [id DS] ''   
 | | |           | |       |InplaceDimShuffle{x} [id DT] ''   
 | | |           | |       | |TensorConstant{6.283185307179586} [id DK]
 | | |           | |       |Subtensor{int64::} [id DL] ''   
 | | |           | |ScalarFromTensor [id DU] ''   
 | | |           |   |Prod{axis=None, dtype='int64', acc_dtype='int64'} [id DV] ''   
 | | |           |     |MakeVector{dtype='int64'} [id DW] ''   
 | | |           |       |Subtensor{int64} [id CP] ''   
 | | |           |       |Elemwise{Cast{int64}} [id DX] ''   
 | | |           |       | |TensorConstant{7} [id CT]
 | | |           |       |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |           |MakeVector{dtype='int64'} [id DY] ''   
 | | |             |Subtensor{int64} [id CP] ''   
 | | |             |Elemwise{Cast{int64}} [id DZ] ''   
 | | |             | |TensorConstant{7} [id CT]
 | | |             |Elemwise{int_div,no_inplace} [id BF] ''   
 | | |MakeVector{dtype='int64'} [id EA] ''   
 | |   |Elemwise{mul,no_inplace} [id EB] ''   
 | |   | |Subtensor{int64} [id EC] ''   
 | |   | | |Shape [id ED] ''   
 | |   | | | |Join [id H] ''   
 | |   | | |Constant{0} [id EE]
 | |   | |TensorConstant{7} [id CT]
 | |   |Elemwise{int_div,no_inplace} [id BF] ''   
 | |Alloc [id EF] ''   
 |   |TensorConstant{1.0} [id N]
 |   |Subtensor{int64} [id EG] ''   
 |   | |Shape [id EH] ''   
 |   | | |Reshape{2} [id D] ''   
 |   | |Constant{0} [id EI]
 |   |TensorConstant{1} [id C]
 |p::W [id EJ]

As you can see all of the Prod operators are under the mrg_uniform which is in terms part of the mrg_normal and are all together under ZeroGrad. My question is how can I actually avoid this and tell Theano not to do

@lamblin
Copy link
Member

lamblin commented Mar 31, 2017

Right, zero_grad should also zero out the R_op, but does not.

You can use disconnected_grad instead, which should work.

@botev
Copy link
Contributor Author

botev commented Apr 3, 2017

@lamblin I've tried disconnected_grad(get_rng().normal(mu.shape)) still gives the same Rop error which means it does not work.

@lamblin
Copy link
Member

lamblin commented Apr 3, 2017

Indeed, ROp seems to ignore disconnected patterns, and still calls Prod.R_Op, which I did not expect.
I guess you could implement locally Prod.R_Op to only accept None, and return None in that case.

@botev
Copy link
Contributor Author

botev commented May 24, 2017

@lamblin

I've implemented successfully the Rop for Prod and ZeroGrad. Could you point me out to where should I stick the tests? Also, currently the Prod Rop gives me the following warnings:

/home/alex/work/python/Theano/theano/tensor/basic.py:5130: UserWarning: flatten outdim parameter is deprecated, use ndim instead.
  "flatten outdim parameter is deprecated, use ndim instead.")
/home/alex/work/python/Theano/theano/tensor/basic.py:5130: UserWarning: flatten outdim parameter is deprecated, use ndim instead.
  "flatten outdim parameter is deprecated, use ndim instead.")

However, in my code I'm not using anywhere flatten so this is coming from somewhere else, but it is not clear where (I've reused a lot of the code from the Lop of the Operator).

@botev
Copy link
Contributor Author

botev commented May 24, 2017

There seems to some problem with sampling however. It is in fact not coming from the Rop implementation by itself or the zero_grad, but something more subtle.

The problem is that because of the Rop traversing everything and the Prod for the sampling being int64 somewhere you create a float64 and thus if we have the warn_float64 = raise it creates an error. An illustrative example of how it fails after I've implemented the Rop is:

def th_normal(shape, mean=0, std=1, dtype=None):
    dtype = dtype or theano.config.floatX
    srng = RandomStreams(np.random.randint(1, 2147462579))
    samples = srng.normal(shape, dtype=dtype)
    samples = zero_grad(samples)
    if std != 1:
        samples *= std
    if mean != 0:
        samples += mean
    return samples


def float64_error():
    a = T.fmatrix()
    b = T.fmatrix()
    shape = (a.shape[0], 30, a.shape[1])
    epsilon = th_normal(shape)
    # epsilon = T.zeros(shape)
    out = a.dimshuffle(0, 'x', 1) * epsilon + a.dimshuffle(0, 'x', 1)
    rop = T.Rop(out, a, b)
    f = theano.function([a, b], rop)
    a_in = np.random.randn(5, 6).astype(theano.config.floatX)
    b_in = np.random.randn(5, 6).astype(theano.config.floatX)
    print(f(a_in, b_in))

Running this will raise an error. Commenting out the line with epsilon = T.zeros(shape) will not raise an error. The error itself is:

Traceback (most recent call last):
  File "/home/alex/work/python/Theano/bin/rop_prod.py", line 97, in <module>
    float64_error()
  File "/home/alex/work/python/Theano/bin/rop_prod.py", line 46, in float64_error
    rop = T.Rop(out, a, b)
  File "/home/alex/work/python/Theano/theano/gradient.py", line 293, in Rop
    _traverse(out.owner)
  File "/home/alex/work/python/Theano/theano/gradient.py", line 256, in _traverse
    _traverse(inp.owner)
  File "/home/alex/work/python/Theano/theano/gradient.py", line 256, in _traverse
    _traverse(inp.owner)
  File "/home/alex/work/python/Theano/theano/gradient.py", line 256, in _traverse
    _traverse(inp.owner)
  [Previous line repeated 13 more times]
  File "/home/alex/work/python/Theano/theano/gradient.py", line 288, in _traverse
    seen_nodes[node] = op.R_op(node.inputs, same_type_eval_points)
  File "/home/alex/work/python/Theano/theano/tensor/elemwise.py", line 605, in R_op
    rop_out = bgrads[jdx] * eval_point
  File "/home/alex/work/python/Theano/theano/tensor/var.py", line 155, in __mul__
    return theano.tensor.mul(self, other)
  File "/home/alex/work/python/Theano/theano/gof/op.py", line 615, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/alex/work/python/Theano/theano/tensor/elemwise.py", line 565, in make_node
    out_broadcastables)]
  File "/home/alex/work/python/Theano/theano/tensor/elemwise.py", line 564, in <listcomp>
    for dtype, broadcastable in izip(out_dtypes,
  File "/home/alex/work/python/Theano/theano/gof/type.py", line 420, in __call__
    return utils.add_tag_trace(self.make_variable(name))
  File "/home/alex/work/python/Theano/theano/tensor/type.py", line 352, in make_variable
    return self.Variable(self, name=name)
  File "/home/alex/work/python/Theano/theano/tensor/var.py", line 821, in __init__
    raise Exception(msg)
Exception: You are creating a TensorVariable with float64 dtype. You requested an action via the Theano flag warn_float64={ignore,warn,raise,pdb}.

Any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants