-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an nondeterministic
option to control MergeOptimization
#6691
Add an nondeterministic
option to control MergeOptimization
#6691
Conversation
Tests seem to break. |
It's a simple code formatting thing; will update soon
…On Mon, Feb 18, 2019, 2:51 AM Thomas Wiecki ***@***.*** wrote:
Tests seem to break.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#6691 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7TUdfj6xgL8Vst0EpnPydDkOM3dIhDks5vOml-gaJpZM4a_-2Z>
.
|
`Op`s that do not produce deterministic output (e.g. those that produce random samples) can set `Op.nondeterministic = True` and cause `MergeOptimizer` to gracefully ignore the corresponding nodes.
d490014
to
71ecb27
Compare
You know, whenever I run the same
Are these local |
Seeing what @nouiz thinks of this. |
nb_fail += 1 | ||
fgraph.merge_feature.blacklist.append( | ||
(pairs[0][0].owner, pairs[0][1].owner)) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seem good.
@@ -317,6 +317,18 @@ def test_straightforward(self): | |||
MergeOptimizer().optimize(g) | |||
assert str(g) == "[Op1(*1 -> Op2(x, y), *1, Op2(x, z))]" | |||
|
|||
def test_nondeterministic(self): | |||
x, y, z = inputs() | |||
op2.nondeterministic = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the goal of that?
Currently, Theano op are all pseudo-deterministic for a fixed input.
Can you describe more your use case?
For random number generator, we put the seed as an input of the node to keep that pseudo-deterministic behavior.
Maybe there is a an existing mechanism that would allow that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the initial comment, MergeOptimizer
will merge nodes for Op
s with the same input, which causes problems for Op
s that are fundamentally non-deterministic (e.g. by erroneously merging/removing them). Also, I couldn't find a better means of addressing this at the Op
or FunctionGraph
level.
Otherwise, working around this design limitation by pushing the expression/enforcement of non-determinism to the input level only relocates the problem (and/or attempts to make it user-level). That's all perfectly fine if Theano is expressly designed to have a deterministic Op
restriction, but, when a direct solution for (at least some) Op
-level non-determinism could be as simple as this PR, I think it's worth reconsidering.
Furthermore, the problem appears for RandomFunction
when one uses a shared RNG state:
import numpy as np
import theano
import theano.tensor as tt
from theano.printing import debugprint as tt_dprint
from theano.gof.fg import FunctionGraph
from theano.gof.opt import MergeOptimizer
rand_state = theano.shared(np.random.RandomState())
rvop = lambda x, y: tt.raw_random.normal(rand_state, [1], x, y)[1]
x, y, z = tt.vectors('xyz')
e = tt.add(rvop(x, y), rvop(x, y), rvop(x, z))
g = FunctionGraph([rand_state, x, y, z], [e])
g_opt = g.clone()
MergeOptimizer().optimize(g_opt)
tt_dprint([g, g_opt])
Elemwise{add,no_inplace} [id A] '' 9
|RandomFunction{normal}.1 [id B] '' 8
| |<RandomStateType> [id C]
| |Elemwise{Cast{int64}} [id D] '' 7
| | |MakeVector{dtype='int8'} [id E] '' 6
| | |TensorConstant{1} [id F]
| |x [id G]
| |y [id H]
|RandomFunction{normal}.1 [id I] '' 5
| |<RandomStateType> [id C]
| |Elemwise{Cast{int64}} [id J] '' 4
| | |MakeVector{dtype='int8'} [id K] '' 3
| | |TensorConstant{1} [id F]
| |x [id G]
| |y [id H]
|RandomFunction{normal}.1 [id L] '' 2
|<RandomStateType> [id C]
|Elemwise{Cast{int64}} [id M] '' 1
| |MakeVector{dtype='int8'} [id N] '' 0
| |TensorConstant{1} [id F]
|x [id G]
|z [id O]
Elemwise{add,no_inplace} [id P] '' 4
|RandomFunction{normal}.1 [id Q] '' 3
| |<RandomStateType> [id R]
| |Elemwise{Cast{int64}} [id S] '' 1
| | |MakeVector{dtype='int8'} [id T] '' 0
| | |TensorConstant{1} [id U]
| |x [id V]
| |y [id W]
|RandomFunction{normal}.1 [id Q] '' 3
|RandomFunction{normal}.1 [id X] '' 2
|<RandomStateType> [id R]
|Elemwise{Cast{int64}} [id S] '' 1
|x [id V]
|z [id Y]
Now, I'm guessing that the intended use is something like the following (because it doesn't erroneously merge/remove nodes):
rv1 = rvop(x, y, rand_state)
rv2 = rvop(x, y, rv1[0])
rv3 = rvop(x, z, rv2[0])
e = tt.add(rv1[1], rv2[1], rv3[1])
Unfortunately, whether that is the intended use or not, we've already demonstrated how the design problem persists even at the input/user level—e.g. by introducing errors through otherwise acceptable use of the interface and/or sacrificing the basic interface for something much more cumbersome. While this specific example could potentially be fixed by special considerations in the relevant Op
s, MergeFeature
, etc., or with additional documentation, both are clearly undesirable kludges that leave the underlying problems intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That merge is not erroneous or incorrect, though. It will give the same numeric answer as the un-optimized version.
The NumPy code equivalent to "rvop" would be something like:
def rvop(x, y):
r_state = copy.copy(rand_state)
sample = r_state.normal(x, y)
return sample
You can see the random state as a monad, that needs to be passed as input and returned as output with a different value.
In that sense, an op with the same inputs (including the state) will always return the same output, so they can be merged.
This is important because for some gradient computations, for instance, we may need the same sequence of random numbers to be generated more than once.
If you are proposing that, in some circumstances, we may want to replace one random state with another as the input of a sampling node, or merge sampling nodes that have the same inputs except from the random state, there is certainly a use for such an "unsafe" optimization. For instance, dead code elimination could be allowed to remove sampling nodes that only update the state if the sample is never used.
However I do not believe that the default, regular way of sampling numbers should become non-deterministic, when currently it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That merge is not erroneous or incorrect, though. It will give the same numeric answer as the un-optimized version.
That's only true if the underlying RNG state happens to get copied at some point at or before the implementation/perform
level, but it is dangerously incorrect at the graph level. If Theano is designed to operate on that assumption, then it's setting an unnecessary and error-prone requirement/dependency between the graph and compute-levels—or its graphs are providing an extremely misleading representation of the relevant operations.
You can see the random state as a monad, that needs to be passed as input and returned as output with a different value.
In that sense, an op with the same inputs (including the state) will always return the same output, so they can be merged.
This is important because for some gradient computations, for instance, we may need the same sequence of random numbers to be generated more than once.
I understand what RandomStreams
is trying to accomplish/model and the desire/need for side-effect-free operations, determinism, etc., but that's not clearly—or at all—related to these changes. Adding a nondeterministic
static field does not violate function purity, nor would its use by FunctionGraph
features and optimizations. If that were the case, then all features and optimizations that rely on Op
type/class information and other non-Apply
node inputs are in gross violation of this principle and are somehow "non-deterministic"—and that would be quite a few of them.
Furthermore, it's trivial to cast these ideas into a functional-like setting; simply introduce a state at the Op
/type level. If you want, consider it a state monad that's analogous to RNG state, except that it's for non-deterministic Op
s and only justifies the uniqueness of their outputs (for otherwise equivalent inputs). A concrete state monad probably isn't necessary, but it's perfectly do-able if need be.
Regardless, as I mentioned before, discussions about this PR keep veering into vaguely related areas of concern. Nothing about this PR is changing—or implying changes to—the functional determinism/purity of Theano Op
s. It simply allows graph-processing functions to more easily treat "theoretically" distinct nodes as distinct by allowing one to designate an Op
as theoretically non-deterministic. If functions use that information to produce different graphs, then that still doesn't affect the Theano purity contract: an Op
still produces the same output given the same input.
Otherwise, If the concern is for graph-level function purity, then I'm even more confused, because the "non-determinism" we're discussing isn't at the graph computation level. A pure function taking graphs with non-deterministic Op
s as input will still produce the same outputs given equivalent inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Theano is designed to operate on that assumption, then it's setting an unnecessary and error-prone requirement/dependency between the graph and compute-levels—or its graphs are providing an extremely misleading representation of the relevant operations.
Yes, the computation graph relies on the assumption that the value in the inputs of an Op will not be changed when that Op gets evaluated (except if it can be guaranteed that no operation will ever read that value again, when inplace optimizations are introduced).
This implies that the value in a random state must not change during the evaluation of an Op it is an input of. The way it is implemented when the underlying RNG implementation operates inplace is by copying the random state.
I understand it is not a natural way of seeing random operations, but I would not call the graph representation misleading: the graph representation is analogous to SSA, where values do not change once assign, and the un-ambiguous way of referring to "new" values of a state is to have the underlying operation ("sampling") explicitly return a new variable. The operations are explicitly deterministic given a state, and output a new state.
Or maybe "sampling" is the confusing term? I should clarify that in that context, each of the "sampling" functions in raw_random perform a deterministic transformation from a PRNG state (and other parameters) to a "sample" and an updated PRNG state. If you have a better term for this procedure, maybe we should use it for this discussion.
Nothing about this PR is changing—or implying changes to—the functional determinism/purity of Theano Ops.
Your change seem to imply there would be non-deterministic operations in graph in the future, which violate the assumption above, and would have consequences reaching well beyond the "merge" optimization.
Otherwise, If the concern is for graph-level function purity, then I'm even more confused, because the "non-determinism" we're discussing isn't at the graph computation level.
Each Op in a graph should be pure, yes. The only computational side-effect happen at the boundaries of the graph, with the "updates" mechanism. This also ensures that different execution strategies of the same graph give the same input (modulo precision errors) regardless of execution order.
For flake8 this is probably due to different version of flake8.
Theano isn't developed anymore. So I'm reluctant to include new feature.
If this is accepted, or will need to be documented. I'll think more about
it.
Le sam. 2 mars 2019 15 h 47, Brandon T. Willard <notifications@github.com>
a écrit :
… ***@***.**** commented on this pull request.
------------------------------
In theano/gof/tests/test_opt.py
<#6691 (comment)>:
> @@ -317,6 +317,18 @@ def test_straightforward(self):
MergeOptimizer().optimize(g)
assert str(g) == "[Op1(*1 -> Op2(x, y), *1, Op2(x, z))]"
+ def test_nondeterministic(self):
+ x, y, z = inputs()
+ op2.nondeterministic = True
As I mentioned in the initial comment
<#6691 (comment)>,
MergeOptimizer will merge nodes for Ops with the same input, which causes
problems for Ops that are fundamentally non-deterministic (e.g. by
erroneously merging/removing them). Also, I couldn't find a better means of
addressing this at the Op or FunctionGraph level.
Otherwise, working around this design limitation by pushing the
expression/enforcement of non-determinism to the input level only relocates
the problem (and/or attempts to make it user-level). That's all perfectly
fine if Theano is expressly designed to have a deterministic Op
restriction, but, when a direct solution for (at least some) Op-level
non-determinism could be as simple as this PR, I think it's worth
reconsidering.
Furthermore, the problem appears for RandomFunction when one uses a
shared RNG state:
import numpy as np
import theano
import theano.tensor as tt
from theano.printing import debugprint as tt_dprint
from theano.gof.fg import FunctionGraph
from theano.gof.opt import MergeOptimizer
rand_state = theano.shared(np.random.RandomState())
rvop = lambda x, y: tt.raw_random.normal(rand_state, [1], x, y)[1]
x, y, z = tt.vectors('xyz')
e = tt.add(rvop(x, y), rvop(x, y), rvop(x, z))
g = FunctionGraph([rand_state, x, y, z], [e])
g_opt = g.clone()
MergeOptimizer().optimize(g_opt)
tt_dprint([g, g_opt])
Elemwise{add,no_inplace} [id A] '' 9
|RandomFunction{normal}.1 [id B] '' 8
| |<RandomStateType> [id C]
| |Elemwise{Cast{int64}} [id D] '' 7
| | |MakeVector{dtype='int8'} [id E] '' 6
| | |TensorConstant{1} [id F]
| |x [id G]
| |y [id H]
|RandomFunction{normal}.1 [id I] '' 5
| |<RandomStateType> [id C]
| |Elemwise{Cast{int64}} [id J] '' 4
| | |MakeVector{dtype='int8'} [id K] '' 3
| | |TensorConstant{1} [id F]
| |x [id G]
| |y [id H]
|RandomFunction{normal}.1 [id L] '' 2
|<RandomStateType> [id C]
|Elemwise{Cast{int64}} [id M] '' 1
| |MakeVector{dtype='int8'} [id N] '' 0
| |TensorConstant{1} [id F]
|x [id G]
|z [id O]
Elemwise{add,no_inplace} [id P] '' 4
|RandomFunction{normal}.1 [id Q] '' 3
| |<RandomStateType> [id R]
| |Elemwise{Cast{int64}} [id S] '' 1
| | |MakeVector{dtype='int8'} [id T] '' 0
| | |TensorConstant{1} [id U]
| |x [id V]
| |y [id W]
|RandomFunction{normal}.1 [id Q] '' 3
|RandomFunction{normal}.1 [id X] '' 2
|<RandomStateType> [id R]
|Elemwise{Cast{int64}} [id S] '' 1
|x [id V]
|z [id Y]
Now, I'm guessing that the intended use is something like the following
(because it doesn't erroneously merge/remove nodes):
rv1 = rvop(x, y, rand_state)
rv2 = rvop(x, y, rv1[0])
rv3 = rvop(x, z, rv2[0])
e = tt.add(rv1[1], rv2[1], rv3[1])
Unfortunately, whether that is the intended use or not, we've already
demonstrated how the design problem persists even at the input/user
level—e.g. by introducing errors through otherwise acceptable use of the
interface and/or sacrificing the basic interface for something much more
cumbersome. While this specific example could potentially be fixed by
special considerations in the relevant Ops, MergeFeature, etc., or with
additional documentation, both are clearly undesirable kludges that leave
the underlying problems intact.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6691 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AALC-6Iv0GuN7OMyBk--rf_cDV1iHL-1ks5vSuNzgaJpZM4a_-2Z>
.
|
After more thinking about this. The fact that ops should be determinist is fondamental in Theano. To support well would request much more then this PR. For example, non-deterministic op break DebugMode. It also break the fact that as a community, we should go in the direction of more replicability of jobs. So I do not see why a dead project should change something so fundamental in a quick way that do not addresse all the problems. Even if you took the time to integrate this well everywhere, I would be again merging this, as Theano is dead. But I'm 100% ready to help you archive your goal. I understand that you current hack probably does what you need. Are you interested to know find another solution that would fit Theano design? If so, we can continue this discussion. The way you use random number isn't the way a Theano user should do it in Theano. This is why you have the bad impression. See Theano documentation about random number on how you should do it. Mostly, you do not create yourself the random state: http://www.deeplearning.net/software/theano/tutorial/examples.html#using-random-numbers What is the op you want to implement? Is it some random number generator? If so, we have a few different in Theano. You can reuse the way we implemented them. Depending of what you want to add, maybe you do not need to replicate everything and can reuse much of the existing code. Do you need a different state then the numpy state or the MRG31k3p that we support? If you can reuse one of those state object, you could just hook yourself into the current infrastructure. If you can describe what your op is doing, I can probably give you better guidance. |
Just like my question in #6686, do the current tests not cover the relevant aspects of
You might be taking these changes to mean significantly more than they actually are. The "non-determinism" that we're referencing is already supported in Theano (via That said, this PR doesn't affect the determinism of anything. It only implements a limited
The problem is that I have other, working approaches to this, but those are the hacks. The use of a class-level member (or something similar) actually makes design sense, because the (pseudo-)non-determinism is an inherent and defining property of the
I addressed this in my previous comment: the underlying issues are only avoided through careful adherence to demonstrably brittle usage conventions and proxy/factory objects (e.g.
The work I'm doing isn't at the user-level; it centers on graph manipulations involving Anyway, I get your point about not wanting to go down these paths with Theano. That's unfortunate, because I still think it's worth the effort, but I respect your position on this, so I'll close the PR. |
I was rethinking about this. To implement it well in Theano, this would need the op to give access to its internal state and the state should be support deepcopy. With this, you would be able to make DebugMode copy the state in addition to the other information to be able to test for the determinism of implementation. It could be used to pickle Theano function correctly. Instead of chasing all places where this can end-up being, this state could be stored in an extra inputs, used only by this node. But to make well, the node should mark that one of the outputs update that input. Theano have basic support for "generic" variable (any python object). So you could make the inputs a dict and put all your state in it. So mostly, you can probably do this inside the Adding an attribute to op to have them not raise an error when passed to function() seem something acceptable now in Theano. No change in default Theano, but easy extension. What do you think of that? |
Op
s that do not produce deterministic output (e.g. those that produce random samples) can setOp.nondeterministic = True
and causeMergeOptimizer
to gracefully ignore the corresponding nodes.Also, the variable name overwriting in
MergeFeature
has been moved to a step after the actual node replacement inMergeOptimizer
. The name is currently being overwritten in-place and before replacement, so, if the replacement fails, the node is left in an unstable state (i.e. with an unnecessarily altered name).