Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an nondeterministic option to control MergeOptimization #6691

Closed

Conversation

brandonwillard
Copy link

Ops that do not produce deterministic output (e.g. those that produce random samples) can set Op.nondeterministic = True and cause MergeOptimizer to gracefully ignore the corresponding nodes.

Also, the variable name overwriting in MergeFeature has been moved to a step after the actual node replacement in MergeOptimizer. The name is currently being overwritten in-place and before replacement, so, if the replacement fails, the node is left in an unstable state (i.e. with an unnecessarily altered name).

@twiecki
Copy link
Contributor

twiecki commented Feb 18, 2019

Tests seem to break.

@brandonwillard
Copy link
Author

brandonwillard commented Feb 18, 2019 via email

`Op`s that do not produce deterministic output (e.g. those that produce random
samples) can set `Op.nondeterministic = True` and cause `MergeOptimizer` to
gracefully ignore the corresponding nodes.
@brandonwillard
Copy link
Author

You know, whenever I run the same test_flake8 locally, it catches a lot of other formatting failures; what's the deal with that? How are those failures (and their corresponding files) filtered in the Travis tests? I see a whitelist_flake8 list in test_flake8.py, but it looks like that's not stopping some of those directories from being tested, and some failures aren't even covered by that list, e.g.:

.../Theano/theano/configdefaults.py:111:21: W504 line break after binary operator

Are these local flake8 settings that are being mistakenly applied during the test (doesn't seem like it)?

@twiecki
Copy link
Contributor

twiecki commented Feb 21, 2019

Seeing what @nouiz thinks of this.

nb_fail += 1
fgraph.merge_feature.blacklist.append(
(pairs[0][0].owner, pairs[0][1].owner))
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seem good.

@@ -317,6 +317,18 @@ def test_straightforward(self):
MergeOptimizer().optimize(g)
assert str(g) == "[Op1(*1 -> Op2(x, y), *1, Op2(x, z))]"

def test_nondeterministic(self):
x, y, z = inputs()
op2.nondeterministic = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal of that?
Currently, Theano op are all pseudo-deterministic for a fixed input.
Can you describe more your use case?
For random number generator, we put the seed as an input of the node to keep that pseudo-deterministic behavior.

Maybe there is a an existing mechanism that would allow that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the initial comment, MergeOptimizer will merge nodes for Ops with the same input, which causes problems for Ops that are fundamentally non-deterministic (e.g. by erroneously merging/removing them). Also, I couldn't find a better means of addressing this at the Op or FunctionGraph level.

Otherwise, working around this design limitation by pushing the expression/enforcement of non-determinism to the input level only relocates the problem (and/or attempts to make it user-level). That's all perfectly fine if Theano is expressly designed to have a deterministic Op restriction, but, when a direct solution for (at least some) Op-level non-determinism could be as simple as this PR, I think it's worth reconsidering.

Furthermore, the problem appears for RandomFunction when one uses a shared RNG state:

import numpy as np
import theano
import theano.tensor as tt
from theano.printing import debugprint as tt_dprint
from theano.gof.fg import FunctionGraph
from theano.gof.opt import MergeOptimizer

rand_state = theano.shared(np.random.RandomState())
rvop = lambda x, y: tt.raw_random.normal(rand_state, [1], x, y)[1]

x, y, z = tt.vectors('xyz')
e = tt.add(rvop(x, y), rvop(x, y), rvop(x, z))
g = FunctionGraph([rand_state, x, y, z], [e])
g_opt = g.clone()

MergeOptimizer().optimize(g_opt)
tt_dprint([g, g_opt])
Elemwise{add,no_inplace} [id A] ''   9
 |RandomFunction{normal}.1 [id B] ''   8
 | |<RandomStateType> [id C]
 | |Elemwise{Cast{int64}} [id D] ''   7
 | | |MakeVector{dtype='int8'} [id E] ''   6
 | |   |TensorConstant{1} [id F]
 | |x [id G]
 | |y [id H]
 |RandomFunction{normal}.1 [id I] ''   5
 | |<RandomStateType> [id C]
 | |Elemwise{Cast{int64}} [id J] ''   4
 | | |MakeVector{dtype='int8'} [id K] ''   3
 | |   |TensorConstant{1} [id F]
 | |x [id G]
 | |y [id H]
 |RandomFunction{normal}.1 [id L] ''   2
   |<RandomStateType> [id C]
   |Elemwise{Cast{int64}} [id M] ''   1
   | |MakeVector{dtype='int8'} [id N] ''   0
   |   |TensorConstant{1} [id F]
   |x [id G]
   |z [id O]
Elemwise{add,no_inplace} [id P] ''   4
 |RandomFunction{normal}.1 [id Q] ''   3
 | |<RandomStateType> [id R]
 | |Elemwise{Cast{int64}} [id S] ''   1
 | | |MakeVector{dtype='int8'} [id T] ''   0
 | |   |TensorConstant{1} [id U]
 | |x [id V]
 | |y [id W]
 |RandomFunction{normal}.1 [id Q] ''   3
 |RandomFunction{normal}.1 [id X] ''   2
   |<RandomStateType> [id R]
   |Elemwise{Cast{int64}} [id S] ''   1
   |x [id V]
   |z [id Y]

Now, I'm guessing that the intended use is something like the following (because it doesn't erroneously merge/remove nodes):

rv1 = rvop(x, y, rand_state)
rv2 = rvop(x, y, rv1[0])
rv3 = rvop(x, z, rv2[0])
e = tt.add(rv1[1], rv2[1], rv3[1])

Unfortunately, whether that is the intended use or not, we've already demonstrated how the design problem persists even at the input/user level—e.g. by introducing errors through otherwise acceptable use of the interface and/or sacrificing the basic interface for something much more cumbersome. While this specific example could potentially be fixed by special considerations in the relevant Ops, MergeFeature, etc., or with additional documentation, both are clearly undesirable kludges that leave the underlying problems intact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That merge is not erroneous or incorrect, though. It will give the same numeric answer as the un-optimized version.
The NumPy code equivalent to "rvop" would be something like:

def rvop(x, y):
    r_state = copy.copy(rand_state)
    sample = r_state.normal(x, y)
    return sample

You can see the random state as a monad, that needs to be passed as input and returned as output with a different value.
In that sense, an op with the same inputs (including the state) will always return the same output, so they can be merged.
This is important because for some gradient computations, for instance, we may need the same sequence of random numbers to be generated more than once.

If you are proposing that, in some circumstances, we may want to replace one random state with another as the input of a sampling node, or merge sampling nodes that have the same inputs except from the random state, there is certainly a use for such an "unsafe" optimization. For instance, dead code elimination could be allowed to remove sampling nodes that only update the state if the sample is never used.
However I do not believe that the default, regular way of sampling numbers should become non-deterministic, when currently it is not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That merge is not erroneous or incorrect, though. It will give the same numeric answer as the un-optimized version.

That's only true if the underlying RNG state happens to get copied at some point at or before the implementation/perform level, but it is dangerously incorrect at the graph level. If Theano is designed to operate on that assumption, then it's setting an unnecessary and error-prone requirement/dependency between the graph and compute-levels—or its graphs are providing an extremely misleading representation of the relevant operations.

You can see the random state as a monad, that needs to be passed as input and returned as output with a different value.
In that sense, an op with the same inputs (including the state) will always return the same output, so they can be merged.
This is important because for some gradient computations, for instance, we may need the same sequence of random numbers to be generated more than once.

I understand what RandomStreams is trying to accomplish/model and the desire/need for side-effect-free operations, determinism, etc., but that's not clearly—or at all—related to these changes. Adding a nondeterministic static field does not violate function purity, nor would its use by FunctionGraph features and optimizations. If that were the case, then all features and optimizations that rely on Op type/class information and other non-Apply node inputs are in gross violation of this principle and are somehow "non-deterministic"—and that would be quite a few of them.

Furthermore, it's trivial to cast these ideas into a functional-like setting; simply introduce a state at the Op/type level. If you want, consider it a state monad that's analogous to RNG state, except that it's for non-deterministic Ops and only justifies the uniqueness of their outputs (for otherwise equivalent inputs). A concrete state monad probably isn't necessary, but it's perfectly do-able if need be.

Regardless, as I mentioned before, discussions about this PR keep veering into vaguely related areas of concern. Nothing about this PR is changing—or implying changes to—the functional determinism/purity of Theano Ops. It simply allows graph-processing functions to more easily treat "theoretically" distinct nodes as distinct by allowing one to designate an Op as theoretically non-deterministic. If functions use that information to produce different graphs, then that still doesn't affect the Theano purity contract: an Op still produces the same output given the same input.

Otherwise, If the concern is for graph-level function purity, then I'm even more confused, because the "non-determinism" we're discussing isn't at the graph computation level. A pure function taking graphs with non-deterministic Ops as input will still produce the same outputs given equivalent inputs.

Copy link
Member

@lamblin lamblin Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Theano is designed to operate on that assumption, then it's setting an unnecessary and error-prone requirement/dependency between the graph and compute-levels—or its graphs are providing an extremely misleading representation of the relevant operations.

Yes, the computation graph relies on the assumption that the value in the inputs of an Op will not be changed when that Op gets evaluated (except if it can be guaranteed that no operation will ever read that value again, when inplace optimizations are introduced).
This implies that the value in a random state must not change during the evaluation of an Op it is an input of. The way it is implemented when the underlying RNG implementation operates inplace is by copying the random state.

I understand it is not a natural way of seeing random operations, but I would not call the graph representation misleading: the graph representation is analogous to SSA, where values do not change once assign, and the un-ambiguous way of referring to "new" values of a state is to have the underlying operation ("sampling") explicitly return a new variable. The operations are explicitly deterministic given a state, and output a new state.

Or maybe "sampling" is the confusing term? I should clarify that in that context, each of the "sampling" functions in raw_random perform a deterministic transformation from a PRNG state (and other parameters) to a "sample" and an updated PRNG state. If you have a better term for this procedure, maybe we should use it for this discussion.

Nothing about this PR is changing—or implying changes to—the functional determinism/purity of Theano Ops.

Your change seem to imply there would be non-deterministic operations in graph in the future, which violate the assumption above, and would have consequences reaching well beyond the "merge" optimization.

Otherwise, If the concern is for graph-level function purity, then I'm even more confused, because the "non-determinism" we're discussing isn't at the graph computation level.

Each Op in a graph should be pure, yes. The only computational side-effect happen at the boundaries of the graph, with the "updates" mechanism. This also ensures that different execution strategies of the same graph give the same input (modulo precision errors) regardless of execution order.

@nouiz
Copy link
Member

nouiz commented Mar 3, 2019 via email

@nouiz
Copy link
Member

nouiz commented Mar 3, 2019

After more thinking about this. The fact that ops should be determinist is fondamental in Theano. To support well would request much more then this PR. For example, non-deterministic op break DebugMode. It also break the fact that as a community, we should go in the direction of more replicability of jobs.

So I do not see why a dead project should change something so fundamental in a quick way that do not addresse all the problems. Even if you took the time to integrate this well everywhere, I would be again merging this, as Theano is dead.

But I'm 100% ready to help you archive your goal. I understand that you current hack probably does what you need. Are you interested to know find another solution that would fit Theano design? If so, we can continue this discussion.

The way you use random number isn't the way a Theano user should do it in Theano. This is why you have the bad impression. See Theano documentation about random number on how you should do it. Mostly, you do not create yourself the random state: http://www.deeplearning.net/software/theano/tutorial/examples.html#using-random-numbers

What is the op you want to implement? Is it some random number generator? If so, we have a few different in Theano. You can reuse the way we implemented them. Depending of what you want to add, maybe you do not need to replicate everything and can reuse much of the existing code. Do you need a different state then the numpy state or the MRG31k3p that we support? If you can reuse one of those state object, you could just hook yourself into the current infrastructure.

If you can describe what your op is doing, I can probably give you better guidance.

@brandonwillard
Copy link
Author

After more thinking about this. The fact that ops should be determinist is fondamental in Theano. To support well would request much more then this PR. For example, non-deterministic op break DebugMode.

Just like my question in #6686, do the current tests not cover the relevant aspects of DebugMode enough to say whether or not non-determinism-related changes would break it? If such tests are missing, then it's reasonable to make their addition a prerequisite for this kind of work.

It also break the fact that as a community, we should go in the direction of more replicability of jobs.
So I do not see why a dead project should change something so fundamental in a quick way that do not addresse all the problems. Even if you took the time to integrate this well everywhere, I would be again merging this, as Theano is dead.

You might be taking these changes to mean significantly more than they actually are. The "non-determinism" that we're referencing is already supported in Theano (via RandomFunction and the like). Plus, it's really pseudo-non-determinism, and not the the kind that fundamentally sacrifices the replicability of jobs. As a matter of fact, we could drop all the talk about "determinism" and instead focus on Ops that we want to flag as "mergeable" or not. However, determinism is a more generally applicable property that we might be interested in using/knowing outside of merging, so I went with that.

That said, this PR doesn't affect the determinism of anything. It only implements a limited MergeFeature-specific consistency check by way of a consideration for non-determinism explicitly designated at the Op level—where such a property is perhaps most appropriately specified. In other words, the changes are specific to MergeFeature and can only reasonably affect existing operations if they happen to use an Op that also defines a class-level field named nondeterministic.

But I'm 100% ready to help you archive your goal. I understand that you current hack probably does what you need. Are you interested to know find another solution that would fit Theano design? If so, we can continue this discussion.

The problem is that I have other, working approaches to this, but those are the hacks. The use of a class-level member (or something similar) actually makes design sense, because the (pseudo-)non-determinism is an inherent and defining property of the Op and its use in MergeFeature leads to a succinct solution to the problem.

The way you use random number isn't the way a Theano user should do it in Theano. This is why you have the bad impression. See Theano documentation about random number on how you should do it. Mostly, you do not create yourself the random state: http://www.deeplearning.net/software/theano/tutorial/examples.html#using-random-numbers

I addressed this in my previous comment: the underlying issues are only avoided through careful adherence to demonstrably brittle usage conventions and proxy/factory objects (e.g. RandomStreams). The usage restrictions and proxy objects could most likely be removed if it were only straight-forward to identify pseudo-non-deterministic Ops in FunctionGraph features, optimizations, etc. This PR establishes such a means, hardens a core FunctionGraph feature using said means, and doesn't appear to compromise any existing functionality.

What is the op you want to implement? Is it some random number generator? If so, we have a few different in Theano. You can reuse the way we implemented them. Depending of what you want to add, maybe you do not need to replicate everything and can reuse much of the existing code. Do you need a different state then the numpy state or the MRG31k3p that we support? If you can reuse one of those state object, you could just hook yourself into the current infrastructure.

If you can describe what your op is doing, I can probably give you better guidance.

The work I'm doing isn't at the user-level; it centers on graph manipulations involving Ops like RandomFunction. As such, it doesn't benefit from the user-level shielding that utilities like RandomStreams provide, and it can easily bring out graph-level issues like this one.
In direct relation to the user-level restrictions, by requiring developers to compensate for even the existing pseudo-non-determinism through otherwise unnecessary manipulations of the underlying RNG state (as RandomStreams appears to do), this kind of work ends up being needlessly fraught with surprising results/bugs that ultimately limit the kinds of optimizations people are willing to implement. Plus, we've only talked about a single, specific case; there are bound to be other scenarios where RNG state hacks aren't reasonable.

Anyway, I get your point about not wanting to go down these paths with Theano. That's unfortunate, because I still think it's worth the effort, but I respect your position on this, so I'll close the PR.

@nouiz
Copy link
Member

nouiz commented Mar 26, 2019

I was rethinking about this. To implement it well in Theano, this would need the op to give access to its internal state and the state should be support deepcopy. With this, you would be able to make DebugMode copy the state in addition to the other information to be able to test for the determinism of implementation. It could be used to pickle Theano function correctly.

Instead of chasing all places where this can end-up being, this state could be stored in an extra inputs, used only by this node. But to make well, the node should mark that one of the outputs update that input.

Theano have basic support for "generic" variable (any python object). So you could make the inputs a dict and put all your state in it.

So mostly, you can probably do this inside the make_node of an Op and this won't break the current way Theano work, it would not request the fix this PR is doing. Probably it would need to pass for now a parameter to function() to not raise error as some inplace are in the input graph. But this could be removed by adding a special attribute to Ops that want that.

Adding an attribute to op to have them not raise an error when passed to function() seem something acceptable now in Theano. No change in default Theano, but easy extension.

What do you think of that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants