-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BatchAutoDiffCost: make shape consistent #7
BatchAutoDiffCost: make shape consistent #7
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the kind words and the contributions!
Could you give a small example of what this solves exactly? I always just used the [..., i]
syntax to support both as in the examples in the README. Is that insufficient? Also, would BatchAutoDiffDynamics
need to change?
@@ -353,7 +353,7 @@ def __init__(self, f, state_size, action_size, **kwargs): | |||
# Prepare inputs. | |||
self._x = x = T.dvector("x") | |||
self._u = u = T.dvector("u") | |||
self._i = i = T.dvector("i") | |||
self._i = i = T.dscalar("i") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep T.dvector
here to be consistent with the rest of the Cost
and Dynamics
implementations. The idea was to support computing the cost of a batch of (x, u, i) pairs in one shot. We can revisit this though, but probably better in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree it seems conceptually cleaner to support evaluating at different i
values. I'm not sure I can think of a concrete use case, and it does complicate implementing Cost
a little, but I don't have strong feelings here. I originally made this change since l(x,u,i)
failed and required a call l(x,u,[i])
, but I can fix that directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should terminal
also be a vector if i
is? I'm confused how these interact.
It's possible to write cost functions in such a way that they support both batched and non-batched inputs, but it's error prone. In particular, it's easy to write a function which works for import numpy as np
from theano import tensor as T
from theano.printing import Print
from ilqr.cost import BatchAutoDiffCost
class DemoCost(BatchAutoDiffCost):
def __init__(self, axis):
def f(x, u, i, terminal):
if terminal:
control_cost = T.zeros_like(x[..., 0])
else:
control_cost = T.square(u).sum(axis=axis)
foo = x[..., 0:2]
state_cost = T.sqrt(T.sum(x, axis=axis))
cost = state_cost + control_cost
cost = control_cost
return cost
super().__init__(f, state_size=2, action_size=2)
def similar(x, y):
x = abs(x - y) < 1e-8
if np.isscalar(x):
return x
else:
return x.all()
def test():
correct_cost = DemoCost(axis=-1) # this always works
cost0 = None
try:
# This one is wrong, but can be created in both master and PR.
cost0 = DemoCost(axis=0)
except Exception as e:
print('DemoCost(axis=0) failed with', e)
cost1 = None
try:
# This one is correct. It cannot be created in master. It can in PR.
cost1 = DemoCost(axis=1)
except Exception as e:
print('DemoCost(axis=1) failed with', e)
x = np.array([1, 2])
u = np.array([1, 2])
for i, cost in enumerate([cost0, cost1]):
print('run ', i)
if cost is None:
continue
try: # on PR branch
print('l', similar(correct_cost.l(x, u, 0), cost.l(x, u, 0)))
except Exception as e: # on master branch
print('Exception: ', e)
print('l', similar(correct_cost.l(x, u, [0]), cost.l(x, u, [0])))
print('l_x', similar(correct_cost.l_x(x, u, 0), cost.l_x(x, u, 0)))
print('l_u', similar(correct_cost.l_u(x, u, 0), cost.l_u(x, u, 0)))
print('l_xx', similar(correct_cost.l_xx(x, u, 0), cost.l_xx(x, u, 0)))
print('l_uu', similar(correct_cost.l_uu(x, u, 0), cost.l_uu(x, u, 0)))
if __name__ == '__main__':
test() On my machine, on
By contrast, in the PR branch:
|
Sorry, I forgot to address this! I understand what you mean now. For completeness' sake, I was only able to reproduce your example by removing the And I didn't notice there was a discrepancy with the |
You're right the I've changed I believe this addresses all outstanding issues in this PR; let me know if there are any other changes you'd like me to make. |
First of all, thanks for the great repo -- cleanest implementation of iLQR I've found and easily extensible for my needs.
The current implementation of
BatchAutoDiffCost
passes in the inputs without a batch in the first dimension forl
,l_x
andl_u
(i.e.x.shape == (state_size,)
), and inputs with a batch forl_xx
,l_ux
andl_uu
(i.e.x.shape == (batch_size, state_size)
).Although the cost function
l
can be written to handle both inputs, I find it quite fiddly to get this right, and it means doing things like summing overaxis=-1
sinceaxis=1
doesn't exist in the non-batch case. In this PR I change the input to always have a batch in the first dimension (sometimes of length 1). I also redefinei
to be a scalar, which I think it always will be.