-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug where non-differentiable NumPy tensors were not unwrapped #941
Conversation
…ylane into fix-autograd-unwrapping
if return_arraybox: | ||
return params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to determine which parameters are trainable, we must not wrap the parameters in an autograd list.
all_params_unwrapped = [p.numpy() if isinstance(p, np.tensor) else p for p in self._all_parameter_values] | ||
|
||
# evaluate the tape | ||
self.set_parameters(all_params_unwrapped, trainable_only=False) | ||
res = self.execute_device(params, device=device) | ||
self.set_parameters(self._all_parameter_values, trainable_only=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra list comprehension and the two new set_parameters
might add some overhead, worth benchmarking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this is the same solution the TF and Torch interfaces currently implement, so my (gut) feeling is that the overhead is negligible.
Codecov Report
@@ Coverage Diff @@
## master #941 +/- ##
=======================================
Coverage 97.85% 97.85%
=======================================
Files 147 147
Lines 10124 10127 +3
=======================================
+ Hits 9907 9910 +3
Misses 217 217
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks kinda fundamental; is it possible to unit test it?
if idx in self.trainable_params | ||
] | ||
if return_arraybox: | ||
return params | ||
|
||
return autograd.builtins.list(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When return_arraybox=False
, is the previous behavior bugged, or does params
look the same? Just curious because it looks like the default behavior is changing too, with params
still being passed to autograd.builtins.list
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Cody, this method previously wasn't buggy - while the code here has changed, the (public) logic remains identical.
The main change is that get_parameters()
no longer calls self._update_trainable_params()
, due to changes in the latter method, which required a bit of a rewrite of the subsequent lines.
So the output of this method should remain the same pre- and post-change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, thanks!
self.set_parameters(all_params_unwrapped, trainable_only=False) | ||
res = self.execute_device(params, device=device) | ||
self.set_parameters(self._all_parameter_values, trainable_only=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, what are the two new lines for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! These two additional lines are the main bugfix.
Previously, res = self.execute_device(params, device=device)
would execute the device with the (new) trainable parameters. These would be unwrapped, so that a ndarray
would be passed to the device rather than a np.tensor
.
However, this did not consider the edge case where non-trainable parameters needed to be unwrapped. This explains why requires_grad=True
worked, whereas requires_grad=False
didn't!
Here, we are unwrapping all parameters on the tape, executing it, and then reverting to the original parameters (so there are no side-effects for the user)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be made more efficient in the long-term, by making the two new set_parameter
calls the only place where parameter setting happens, and remove the analogous logic (for only trainable parameters) from execute_device
.
Co-authored-by: Cody Wang <speller26@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for tracking this down and fixing it so quickly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146!
res = self.execute_device(params, device=device) | ||
self.set_parameters(self._all_parameter_values, trainable_only=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix comes with a cost of two more calls to set_parameters
? Why do we have to reset the parameters back, isn't it ok to just permanently cast the requires_grad
tensors to numpy arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The second call to set_parameters
, where the unwrapped parameters are replaced with the wrapped ones, is more to avoid surprising the user. That is, undoing any side effects that tape execution had.
This won't be required if using a QNode, but if using a tape, you might notice the parameter types had changed before/after executing the tape on a device 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, after this PR is merged we can re-visit and remove all the set_parameter()
logic in all three interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146
Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146!
op_idx = self._par_info[p_idx]["p_idx"] | ||
params.append(op.data[op_idx]) | ||
|
||
return params if return_arraybox else autograd.builtins.list(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm not seeing the subtlety: what was the motivation for updating get_parameters()
and _update_trainable_params()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, get_parameters()
would call _update_trainable_params()
with every call. I don't think in retrospect this was the right approach;
-
setting the trainable parameters will be called in general less often than
get_parameters()
-- the dependency should go the other way. To try and mitigate this overhead, I was setting an instance attribute (params = self._all_parameter_values
) which was an ugly side effect that I never liked. -
It resulted in a
get_parameter()
method that differed wildly from the base tape method.
In this PR, this has been corrected:
-
Now,
_update_trainable_params()
callsget_parameters()
, to get the latest parameters and determine which are trainable. -
The contents of the
get_parameters()
method is now almost identical to the base method, with the exception of the final return statement.
The latter approach is the better approach with less overhead, with the subtlety that _update_trainable_params()
needs to get the non-autograd-list sequence of parameters to determine trainability. So the latter approach should have been implemented from the start.
As to why it was changed in this PR; I can't answer that properly. The change below to _execute
(to fix the bug) caused the old approach to stop working -- through trial and error I discovered that this approach (the better one) does work with the change, and the entire PL test suite passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Context:
The autograd interface was not unwrapping tensors where
requires_grad=False
. This was intentional, since thenp.tensor
ndarray subclass, due to ducktyping, should be accepted everywhere that a ndarray would be.However, recent changes to the
np.tensor
class have resulted in deviating behaviour from standard ndarrays (e.g., indexing an element returns a scalar tensor rather than a Python float), resulting in some devices unable to process gate arguments correctly.Description of the Change:
Like the PyTorch and the TF interfaces, constant non-ndarray classes are now always unwrapped before device execution.
Benefits:
np.tensor
objects are now never passed to the device - the device will only receive ndarray objects.Possible Drawbacks:
This extra unwrapping may add some overhead. This overhead may be negligible, but it may not(?).
Further, some devices did work fine with
np.tensor
gate arguments (e.g.,default.qubit
), hence why this issue was not noticed here. Perhaps an argument can be made that this fix should be done on the per-device level for devices that need it, to avoid additional overhead? But this would add more maintenance overhead to device developers.Related GitHub Issues: n/a