New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster topo #5794
Faster topo #5794
Conversation
theano/configdefaults.py
Outdated
@@ -1555,6 +1555,12 @@ def filter_vm_lazy(val): | |||
IntParam(5, lambda i: i > 0, allow_override=False), | |||
in_c_key=False) | |||
|
|||
AddConfigVar('disbale_cycle_detection', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would use a different flags type, in case we had more option in the futur about about a flag: cycle_detection, that default to "topo" and the new version would be "fast". The next version of fast that allow some sequence of inputs could be given another name like sequence. (or replace fast if fast don't have advantage over it. I think it would be that case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! So you would want it to be a StrParam
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small small small comment: disbale
-> disable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @obilaniu.
theano/gof/destroyhandler.py
Outdated
dm = getattr(app.op, 'destroy_map', None) | ||
if not dm: | ||
return | ||
inputs = sum(dm.values()) # list of app's destroyed inputs | ||
# inputs = sum(dm.values()) # list of app's destroyed inputs | ||
inputs = dm.values()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't good. An op can have multiple outputs and in that cases, we want all the inputs that can be destroyed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I fixed it. The items inside dm.values()
are them selves lists. In the new version the inputs
is a list of all the inputs that have been destroyed.
theano/gof/destroyhandler.py
Outdated
v = getattr(app2, 'view_map', {}).get(inp_idx2, []) | ||
dv = d + v | ||
assert len(dv) <= 1 | ||
if len(v) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an error here and it should be "len(dv)" not "len(v)". What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it should be len(dv)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Fir the first version, we only allow that case.
theano/gof/destroyhandler.py
Outdated
@@ -924,7 +930,8 @@ def on_change_input(self, fgraph, app, i, old_r, new_r, reason): | |||
self.view_o.setdefault(new_r, OrderedSet()).add(output) | |||
|
|||
# TODO: check here only one level of fast destroy_map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
theano/gof/destroyhandler.py
Outdated
@@ -822,7 +827,8 @@ def on_import(self, fgraph, app, reason): | |||
if getattr(app.op, 'destroy_map', {}): | |||
# TODO: check here only one level of fast destroy_map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
theano/gof/destroyhandler.py
Outdated
if config.disbale_cycle_detection and self.fail_validate: | ||
self.fail_validate = False | ||
# raise self.fail_validate | ||
InconsistencyError("error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want that here:
err = self.fail_validate
self.fail_validate = False
raise err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
yes.
…On Fri, Mar 31, 2017 at 2:53 PM Reyhane Askari ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In theano/configdefaults.py
<#5794 (comment)>:
> @@ -1555,6 +1555,12 @@ def filter_vm_lazy(val):
IntParam(5, lambda i: i > 0, allow_override=False),
in_c_key=False)
+AddConfigVar('disbale_cycle_detection',
Right! So you would want it to be a StrParam?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5794 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AALC-0kyHK3G4-1mc1b2fNangV_o4Nv6ks5rrUvCgaJpZM4Mv1DA>
.
|
39e7a7d
to
d0dfbcd
Compare
@nouiz DLT tests passed. Here is the result of profiling with sb_resnet of 11 layers. I cleared the cache and ran the experiment once before the first experiment of each version.
|
theano/gof/destroyhandler.py
Outdated
# inputs = sum(dm.values()) # list of app's destroyed inputs | ||
inputs = dm.values()[0] | ||
inputs = list(set(itertools. | ||
chain.from_iterable(dm.values()))) # list of app's destroyed inputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the problem that in python 3 dm.values() is an iterable? If so, this would be a simpler fix:
inputs = sum(list(dm.values))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the problem is that each item insidedm.values()
is itself a list. So for example :
dm = {0: [1, 2], 1:[0, 1]}
. In this case the inputs that are destroyed are
list(set(1, 2, 0, 1)) = [0, 1, 2]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the call to list. It should not be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You are right.
theano/configdefaults.py
Outdated
"""If true it disables the cycle detection in graph. | ||
""", | ||
BoolParam(False), | ||
StrParam('topo'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add 'fast' in the list here of acceptable values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Thanks.
theano/gof/destroyhandler.py
Outdated
ords = self.orderings(fgraph) | ||
if _contains_cycle(fgraph, ords): | ||
raise InconsistencyError("Dependency graph contains cycles") | ||
for n in fgraph.apply_nodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that explain why we need this. Something like:
self.fail_validate can only be a hint that maybe/probably there is a cycle.
This is because inside replace() we could record many reason to not accept a change.
But we don't know which one will fail first inside validate().
Note, if you find in your benchmark that this is still taking times, we could speed it up again by making self.fail_validate a dict where values are the nodes that failed and the values the error. This would ask us to only revalidate the values of the dict and if they don't revalidate, we would return the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
theano/gof/destroyhandler.py
Outdated
@@ -891,6 +928,8 @@ def on_change_input(self, fgraph, app, i, old_r, new_r, reason): | |||
|
|||
self.view_o.setdefault(new_r, OrderedSet()).add(output) | |||
|
|||
if config.cycle_detection == 'fast': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a global config inside the op, I would put in the init of this class a new parameter: cycle_detection that default to None. When none, it use the config value. This would make the instance use an instance value instead of a global variable. This would allow user without changes to Theano to have different Theano function with different configuration of this variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
@nouiz , here is the result of benchmarking:
and here is the result of memory profiling: Run master 2: Run master 3: Run fast destroy 2: Run fast destroy 3: |
75403ea
to
afa0de3
Compare
The code look good, just waiting for the profile results. |
@nouiz , here is the result of benchmarking of sb_resnet with 29 layers:
here is the result of memory profiling; |
The results are strange. The faster topo make optimizer faster, which is great. But here is some strange facts:
|
theano/gof/destroyhandler.py
Outdated
@@ -783,6 +783,9 @@ def on_detach(self, fgraph): | |||
delattr(self.fgraph, 'destroy_handler') | |||
self.fgraph = None | |||
|
|||
def on_revert(self, fgraph): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove that method. Finally we didn't use it in this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, thanks.
theano/gof/destroyhandler.py
Outdated
self.fail_validate[app] = InconsistencyError( | ||
"Attempting to destroy indestructible variables: %s" % | ||
inp) | ||
else: | ||
if len(inp.clients) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this indentation to have if, elif, elif from the line 806
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks.
theano/gof/destroyhandler.py
Outdated
# self.fast_destroy(app, 'validate') | ||
for app in fgraph.apply_nodes: | ||
self.fast_destroy(app, 'validate') | ||
self.fail_validate = app_err_pairs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to document how it work. Here is some self notes that I would like to end up in that doc. For later when we are close to merge as this can still change.
- We needed to take care of cases where we record a possible failure, but it is another feature that cause the revert.
- We need to only verify nodes that are still in the graph. Nodes marked as failure are potential failure and need to be rechecked during validate()
- We need to handle the case where after a failed validate, there is no revert and the following validate still need to fail.
- In the case of revert, we need to track that a nodes isn't in the graph anymore and we need to remove from potential failures nodes whose inputs aren't in potential failure anymore.
If you think of other cases we try to cover, add reply with them.
Here are the results for the new commits:
|
Here are the results for the new commits:
|
theano/gof/destroyhandler.py
Outdated
self.fail_validate[app] = theano.gof.InconsistencyError( | ||
"Destroyed variable has destroy_map or view_map. " + str(reason)) | ||
"Destroyed variable has view_map. " + str(reason)) | ||
d = getattr(app2.op, 'destroy_map', {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put all that part in an else to not execute when the first one fail.
theano/gof/destroyhandler.py
Outdated
self.fast_destroy(app, 'validate') | ||
if self.fail_validate: | ||
self.fail_validate = app_err_pairs | ||
err = app_err_pairs.values()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace:
app_err_pairs.values()[0]
by
next(o.itervalues())
@nouiz , here is the benchmarking of sb_resnet with 29 layers. faster is the version that has been pushed to github. @@ -218,7 +218,7 @@ class InplaceElemwiseOptimizer(Optimizer):
check_each_change = config.tensor.insert_inplace_optimizer_validate_nb
if check_each_change == -1:
- if len(fgraph.apply_nodes) > 500:
+ if len(fgraph.apply_nodes) > 500 and fgraph.destroy_handler.algo == "topo":
check_each_change = 10
else:
check_each_change = 1 faster with clients is with this diff: @@ -316,6 +316,7 @@ class InplaceElemwiseOptimizer(Optimizer):
updated_vars = []
vars_from_inplace = []
+ one_clients = []
other_vars = []
@@ -331,11 +332,18 @@ class InplaceElemwiseOptimizer(Optimizer):
# inplace on the updated input via a sequence of
# one or more inplace operations
vars_from_inplace.append(inp_idx)
+ elif len(inp.clients) == 1:
+ one_clients.append(inp_idx)
else:
other_vars.append(inp_idx)
sorted_candidate_inputs = (updated_vars +
- vars_from_inplace + other_vars)
+ vars_from_inplace +
+ one_clients +
+ other_vars) and faster with clients and nodes is with the combination of both of the diffs.
|
I ran all the benchmarking again to double check. Here is the results:
|
Full results:
|
Here is the result of memory profiling:
|
I pushed commits to your branch that should help the profilings. Can you redo the Theano profiling and python profiling of the "node" version and master version? |
|
@nouiz , Here is the result of profiling. I removed the python profiling but it still didn't reduce the timings.
|
I think we are good to merge. Remove the change to the test suite and fix the pep8 failure. gpuarray.preallocate=-1 |
Alright. Sure. |
Why I ask new profile is because now, with the fast topo, the slowest optimization is still the gpu_elemwise_inplace. But having less inplace opt, we don't see a slow down in run time and we don't see an increase in the memory usage. So mostly, why do we do that optimization anymore? So, not for this PR, but to make sure if we still need that optimization or not, we would need those profiles:
Maybe now when we have the preallocate, the inplace is much less useful, but still useful when we disable the cache of allocated memory on the GPU. |
I didn't rebase as rebasing will bring all the commits down and we would not know in the future which profiling is for which commits. |
Here is the result of faster branch compared to the same branch with flag
|
Trigger new buildbot run |
…or don't care about the iteration order. Only local changes.
9eedd8b
to
99f568c
Compare
99f568c
to
82abb2a
Compare
Regarding issue #4233, this PR adds a theano flag disbale_cycle_detection.