Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchAutoDiffCost: make shape consistent #7

Merged
merged 7 commits into from
May 5, 2019

Conversation

AdamGleave
Copy link
Contributor

First of all, thanks for the great repo -- cleanest implementation of iLQR I've found and easily extensible for my needs.

The current implementation of BatchAutoDiffCost passes in the inputs without a batch in the first dimension for l, l_x and l_u (i.e. x.shape == (state_size,)), and inputs with a batch for l_xx, l_ux and l_uu (i.e. x.shape == (batch_size, state_size)).

Although the cost function l can be written to handle both inputs, I find it quite fiddly to get this right, and it means doing things like summing over axis=-1 since axis=1 doesn't exist in the non-batch case. In this PR I change the input to always have a batch in the first dimension (sometimes of length 1). I also redefine i to be a scalar, which I think it always will be.

Copy link
Owner

@anassinator anassinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the kind words and the contributions!

Could you give a small example of what this solves exactly? I always just used the [..., i] syntax to support both as in the examples in the README. Is that insufficient? Also, would BatchAutoDiffDynamics need to change?

@@ -353,7 +353,7 @@ def __init__(self, f, state_size, action_size, **kwargs):
# Prepare inputs.
self._x = x = T.dvector("x")
self._u = u = T.dvector("u")
self._i = i = T.dvector("i")
self._i = i = T.dscalar("i")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep T.dvector here to be consistent with the rest of the Cost and Dynamics implementations. The idea was to support computing the cost of a batch of (x, u, i) pairs in one shot. We can revisit this though, but probably better in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it seems conceptually cleaner to support evaluating at different i values. I'm not sure I can think of a concrete use case, and it does complicate implementing Cost a little, but I don't have strong feelings here. I originally made this change since l(x,u,i) failed and required a call l(x,u,[i]), but I can fix that directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should terminal also be a vector if i is? I'm confused how these interact.

@AdamGleave
Copy link
Contributor Author

AdamGleave commented Feb 3, 2019

It's possible to write cost functions in such a way that they support both batched and non-batched inputs, but it's error prone. In particular, it's easy to write a function which works for l, l_x and l_u but fails on l_xx. I tend to assume that if the function evaluates correctly on a few test inputs, then since the derivatives are computed automatically they should be correct as well. So I found this behaviour quite counterintuitive, and it effectively requires testing twice as many cases. Here's a MWE of the problem:

import numpy as np
from theano import tensor as T
from theano.printing import Print

from ilqr.cost import BatchAutoDiffCost


class DemoCost(BatchAutoDiffCost):
    def __init__(self, axis):
        def f(x, u, i, terminal):
            if terminal:
                control_cost = T.zeros_like(x[..., 0])
            else:
                control_cost = T.square(u).sum(axis=axis)

            foo = x[..., 0:2]
            state_cost = T.sqrt(T.sum(x, axis=axis))
            cost = state_cost + control_cost
            cost = control_cost
            return cost

        super().__init__(f, state_size=2, action_size=2)

def similar(x, y):
    x = abs(x - y) < 1e-8
    if np.isscalar(x):
        return x
    else:
        return x.all()


def test():
    correct_cost = DemoCost(axis=-1)  # this always works
    cost0 = None
    try:
        # This one is wrong, but can be created in both master and PR.
        cost0 = DemoCost(axis=0)
    except Exception as e:
        print('DemoCost(axis=0) failed with', e)
    cost1 = None
    try:
        # This one is correct. It cannot be created in master. It can in PR.
        cost1 = DemoCost(axis=1)
    except Exception as e:
        print('DemoCost(axis=1) failed with', e)

    x = np.array([1, 2])
    u = np.array([1, 2])

    for i, cost in enumerate([cost0, cost1]):
        print('run ', i)
        if cost is None:
            continue
        try:  # on PR branch
            print('l', similar(correct_cost.l(x, u, 0), cost.l(x, u, 0)))
        except Exception as e:  # on master branch
            print('Exception: ', e)
            print('l', similar(correct_cost.l(x, u, [0]), cost.l(x, u, [0])))
        print('l_x', similar(correct_cost.l_x(x, u, 0), cost.l_x(x, u, 0)))
        print('l_u', similar(correct_cost.l_u(x, u, 0), cost.l_u(x, u, 0)))
        print('l_xx', similar(correct_cost.l_xx(x, u, 0), cost.l_xx(x, u, 0)))
        print('l_uu', similar(correct_cost.l_uu(x, u, 0), cost.l_uu(x, u, 0)))


if __name__ == '__main__':
    test()

On my machine, on master this produces:

DemoCost(axis=1) failed with Not enough dimensions on Elemwise{sqr,no_inplace}.0 to reduce on axis 1
l True  # it appears to work...
l_x True                                                                                                                                                                                              
l_u True                                                                                                                                                                                              
l_xx False  # but is actually broken :(
l_uu True

By contrast, in the PR branch:

run  0  # this is wrong, but it's immediately obviously wrong -- it disagrees on l
l False                                                                                                                                                                                               
l_x True
l_u False                                                                                                                                                                                             
l_xx True                                                                                                                                                                                             
l_uu True                                                                                                                                                                                             
run  1  # this is correct
l True                                                                                                                                                                                                
l_x True                                                                                                                                                                                              
l_u True                                                                                                                                                                                              
l_xx True                                                                                                                                                                                             
l_uu True     

@anassinator
Copy link
Owner

Sorry, I forgot to address this!

I understand what you mean now. For completeness' sake, I was only able to reproduce your example by removing the cost = control_cost line.

And I didn't notice there was a discrepancy with the i vector between l() and l_*(). So you can fix that after all. Would you also mind making the same change to BatchAutoDiffDynamics then so they're both consistent?

@AdamGleave
Copy link
Contributor Author

You're right the cost = control_cost line needs to be removed for my example to reproduce; apologies.

I've changed self._i to be dscalar in both BatchAutoDiffCost and BatchAutoDiffDynamics. I also noticed that BatchAutoDiffDynamics had a similar issue of inconsistent # of dimensions, which I've resolved analogously to in BatchAutoDiffCost.

I believe this addresses all outstanding issues in this PR; let me know if there are any other changes you'd like me to make.

@anassinator anassinator merged commit d4a0ac0 into anassinator:master May 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants