Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-linear dynamics -- Trajectory Following #1

Closed
mrsp opened this issue Nov 30, 2017 · 8 comments
Closed

Non-linear dynamics -- Trajectory Following #1

mrsp opened this issue Nov 30, 2017 · 8 comments

Comments

@mrsp
Copy link

mrsp commented Nov 30, 2017

Hello and thanks for the interesting package you've created!

I wanted to ask how can I use it for a dynamic model that depends explicitly on time

i.e x_dot = (t-u)*x^2

where u is the input, t is time and x is the state.

Moreover, can your package be used for Trajectory following?

I.e mininimize -> J = sum_i=0^N ( x(i) - x_ref(i) )^T Q ( x(i) - x_ref(i) ) + ( u(i) - u_ref(i) )^T Q ( u(i) - u_ref(i) )

Thanks in advance!

@anassinator
Copy link
Owner

anassinator commented Nov 30, 2017

Hello!

I just added support for time-varying dynamics models and cost functions in f97ffb3. I haven't tested it as thoroughly yet, so there might be issues.

I think you could set up what you've outlined pretty easily as follows:

import theano.tensor as T
from ilqr.dynamics import AutoDiffDynamics

dt = 0.01
x = T.dscalar("x")
u = T.dscalar("u")
t = T.dscalar("t")

x_dot = (dt * t - u) * x**2
f = T.stack([x + x_dot * dt])

dynamics = AutoDiffDynamics(f, [x], [u], t)

>>> t = 2
>>> x = [2.0]
>>> u = [1.0]
>>> dynamics.f(x, u, t)
... array([ 1.9604])
>>> dynamics.f_x(x, u, t)
... array([[ 0.9604]])

As for your cost function, I believe all you would need is to copy the QRCost class here, but change all instances of x_goal and u_goal to x_ref[t] and u_ref[t].

Essentially, the Dynamics and Cost classes now take an extra parameter t to all their functions. Note that t here is the current time step (i.e. an integer such that 0 <= t <= T where T is your horizon), and not the current time in seconds. If you need the time instead you can just scale it by dt or whatnot easily before using it.

The only issue I see from all of this is that you might need a way to shift your cost function if you are planning to use this as an MPC with the RecedingHorizonController.

Let me know if this works for you and hope this helps!

@mrsp
Copy link
Author

mrsp commented Nov 30, 2017

Hello again

Thanks for your quick and helpful reply!

Hopefully, I'll try out your advice tonight and let you know on my progress.

I don't need MPC at the moment only planning with respect to some non-linear dynamics.

If you haven't implemented it yet, it would be great to add state and control constraints as in
https://homes.cs.washington.edu/~todorov/papers/TassaICRA14.pdf

But still great package and great work @anassinator !

@anassinator
Copy link
Owner

Thanks!

Yeah, I would like to add that at some point. For now, this already supports control constraints through the squashing function outlined in that paper and is used in both the pendulum and cartpole examples with the constrain() and tensor_constrain() functions here, but BoxQP would definitely be better. I don't believe that method directly constrains the state space though, I'll have to check again.

@mrsp
Copy link
Author

mrsp commented Nov 30, 2017

Hello again!

I followed your advice and I got it running (not quite sure if it is correct/accurate since I was testing a simple linear model for the trajectory tracking).

So I changed the following:

class QRCost(Cost):

"""Quadratic Regulator Instantaneous Cost."""

def __init__(self, Q, R, Q_terminal=None, x_goal=None, u_goal=None):
    """Constructs a QRCost.

    Args:
        Q: Quadratic state cost matrix [state_size, state_size].
        R: Quadratic control cost matrix [action_size, action_size].
        Q_terminal: Terminal quadratic state cost matrix
            [state_size, state_size].
        x_goal: Goal state [state_size].
        u_goal: Goal control [action_size].
    """

    self.Q = np.array(Q)
    self.R = np.array(R)

    if Q_terminal is None:
        self.Q_terminal = self.Q
    else:
        self.Q_terminal = np.array(Q_terminal)

    if x_goal is None:
        self.x_goal = np.zeros(Q.shape[0])
    else:
        self.x_goal = np.array(x_goal)

    if u_goal is None:
        self.u_goal= np.zeros(R.shape[0])
    else:
        self.u_goal = np.array(u_goal)

    **"""assert self.Q.shape == self.Q_terminal.shape, "Q & Q_terminal mismatch"
    assert self.Q.shape[0] == self.Q.shape[1], "Q must be square"
    assert self.R.shape[0] == self.R.shape[1], "R must be square"
    assert self.Q.shape[0] == self.x_goal.shape[0], "Q & x_goal mismatch"
    assert self.R.shape[0] == self.u_goal.shape[0], "R & u_goal mismatch"
"""**
    # Precompute some common constants.
    self._Q_plus_Q_T = self.Q + self.Q.T
    self._R_plus_R_T = self.R + self.R.T
    self._Q_plus_Q_T_terminal = self.Q_terminal + self.Q_terminal.T

    super(QRCost, self).__init__()

def l(self, x, u, t, terminal=False):
    """Instantaneous cost function.

    Args:
        x: Current state [state_size].
        u: Current control [action_size]. None if terminal.
        t: Current time step.
        terminal: Compute terminal cost. Default: False.

    Returns:
        Instantaneous cost (scalar).
    """
    Q = self.Q_terminal if terminal else self.Q
    R = self.R
    x_diff = x - self.x_goal[t]
    squared_x_cost = x_diff.T.dot(Q).dot(x_diff)

    if terminal:
        return squared_x_cost

    u_diff = u - self.u_goal[t]
    return squared_x_cost + u_diff.T.dot(R).dot(u_diff)

def l_x(self, x, u, t, terminal=False):
    """Partial derivative of cost function with respect to x.

    Args:
        x: Current state [state_size].
        u: Current control [action_size]. None if terminal.
        t: Current time step.
        terminal: Compute terminal cost. Default: False.

    Returns:
        dl/dx [state_size].
    """
    Q_plus_Q_T = self._Q_plus_Q_T_terminal if terminal else self._Q_plus_Q_T
    x_diff = x - self.x_goal[t]
    return x_diff.T.dot(Q_plus_Q_T)

def l_u(self, x, u, t, terminal=False):
    """Partial derivative of cost function with respect to u.

    Args:
        x: Current state [state_size].
        u: Current control [action_size]. None if terminal.
        t: Current time step.
        terminal: Compute terminal cost. Default: False.

    Returns:
        dl/du [action_size].
    """
    if terminal:
        return np.zeros_like(self.u_goal[t])

    u_diff = u - self.u_goal[t]
    return u_diff.T.dot(self._R_plus_R_T)

Basically Removing the assertion in the init

**adding [t] to the other functions e.g. :
l,
l_x,
l_u

Adding [t] to the init method raised errors which I could not handle!

I called it like this:

dist = np.linspace(0,T+1,T+1)

"""Creating a sin trajectory for the state"""
x_goal = np.array([np.sin(dist)]).transpose()
u_goal = 0*x_goal

cost = QRCost(Q, R, Q_terminal=Q_terminal, x_goal=x_goal, u_goal = u_goal)

Now x_goal and u_goal are trajectories.

If you have a better solution or if you think I made a mistake somewhere in your code, please let me know!

Thanks in advance!

Looking forward to your reply

@anassinator
Copy link
Owner

Yeah that's what I meant.

This is how you can do it without commenting out the assertions BTW and to be more explicit about the sizes required.

class PathQRCost(Cost):

    """Quadratic Regulator Instantaneous Cost for trajectory following."""

    def __init__(self, Q, R, x_path, u_path=None, Q_terminal=None):
        """Constructs a QRCost.

        Args:
            Q: Quadratic state cost matrix [state_size, state_size].
            R: Quadratic control cost matrix [action_size, action_size].
            x_path: Goal state path [T+1, state_size].
            u_path: Goal control path [T, action_size].
            Q_terminal: Terminal quadratic state cost matrix
                [state_size, state_size].
        """
        self.Q = np.array(Q)
        self.R = np.array(R)
        self.x_path = np.array(x_path)

        state_size = self.Q.shape[0]
        action_size = self.R.shape[0]
        path_length = self.x_path.shape[0]

        if Q_terminal is None:
            self.Q_terminal = self.Q
        else:
            self.Q_terminal = np.array(Q_terminal)

        if u_path is None:
            self.u_path = np.zeros(path_length - 1, action_size)
        else:
            self.u_path = np.array(u_path)

        assert self.Q.shape == self.Q_terminal.shape, "Q & Q_terminal mismatch"
        assert self.Q.shape[0] == self.Q.shape[1], "Q must be square"
        assert self.R.shape[0] == self.R.shape[1], "R must be square"
        assert state_size == self.x_path.shape[1], "Q & x_path mismatch"
        assert action_size == self.u_path.shape[1], "R & u_path mismatch"
        assert path_length == self.u_path.shape[0] + 1, "x_path must be 1 longer than u_path"

        # Precompute some common constants.
        self._Q_plus_Q_T = self.Q + self.Q.T
        self._R_plus_R_T = self.R + self.R.T
        self._Q_plus_Q_T_terminal = self.Q_terminal + self.Q_terminal.T

        super(PathQRCost, self).__init__()

    def l(self, x, u, t, terminal=False):
        """Instantaneous cost function.

        Args:
            x: Current state [state_size].
            u: Current control [action_size]. None if terminal.
            t: Current time step.
            terminal: Compute terminal cost. Default: False.

        Returns:
            Instantaneous cost (scalar).
        """
        Q = self.Q_terminal if terminal else self.Q
        R = self.R
        x_diff = x - self.x_path[t]
        squared_x_cost = x_diff.T.dot(Q).dot(x_diff)

        if terminal:
            return squared_x_cost

        u_diff = u - self.u_path[t]
        return squared_x_cost + u_diff.T.dot(R).dot(u_diff)

    def l_x(self, x, u, t, terminal=False):
        """Partial derivative of cost function with respect to x.

        Args:
            x: Current state [state_size].
            u: Current control [action_size]. None if terminal.
            t: Current time step.
            terminal: Compute terminal cost. Default: False.

        Returns:
            dl/dx [state_size].
        """
        Q_plus_Q_T = self._Q_plus_Q_T_terminal if terminal else self._Q_plus_Q_T
        x_diff = x - self.x_path[t]
        return x_diff.T.dot(Q_plus_Q_T)

    def l_u(self, x, u, t, terminal=False):
        """Partial derivative of cost function with respect to u.

        Args:
            x: Current state [state_size].
            u: Current control [action_size]. None if terminal.
            t: Current time step.
            terminal: Compute terminal cost. Default: False.

        Returns:
            dl/du [action_size].
        """
        if terminal:
            return np.zeros_like(self.u_path)

        u_diff = u - self.u_path[t]
        return u_diff.T.dot(self._R_plus_R_T)

    def l_xx(self, x, u, t, terminal=False):
        """Second partial derivative of cost function with respect to x.

        Args:
            x: Current state [state_size].
            u: Current control [action_size]. None if terminal.
            t: Current time step.
            terminal: Compute terminal cost. Default: False.

        Returns:
            d^2l/dx^2 [state_size, state_size].
        """
        return self._Q_plus_Q_T_terminal if terminal else self._Q_plus_Q_T

    def l_ux(self, x, u, t, terminal=False):
        """Second partial derivative of cost function with respect to u and x.

        Args:
            x: Current state [state_size].
            u: Current control [action_size]. None if terminal.
            t: Current time step.
            terminal: Compute terminal cost. Default: False.

        Returns:
            d^2l/dudx [action_size, state_size].
        """
        return np.zeros((self.R.shape[0], self.Q.shape[0]))

    def l_uu(self, x, u, t, terminal=False):
        """Second partial derivative of cost function with respect to u.

        Args:
            x: Current state [state_size].
            u: Current control [action_size]. None if terminal.
            t: Current time step.
            terminal: Compute terminal cost. Default: False.

        Returns:
            d^2l/du^2 [action_size, action_size].
        """
        if terminal:
            return np.zeros_like(self.R)

        return self._R_plus_R_T


T = 100
state_size = 1
action_size = 1

Q = np.eye(state_size)
R = np.eye(action_size)

dist = np.linspace(0,T + 1,T + 1).reshape(-1, 1)
x_path = np.sin(dist)
u_path = np.zeros((T, action_size))

cost = PathQRCost(Q, R, x_path, u_path=u_path)

>>> cost.l([0], [2], 2)
... 4.8114283738935208

@mrsp
Copy link
Author

mrsp commented Dec 1, 2017

Hello

I think this is it, I tested it for a class of systems linear and non-linear and it seems to work!
Maybe you should update your repo with it.

Let me ask something else:

If you constrain the input as you did with saturation can you make your constraints depend on the Theano variables e.g. x,t,u ?

thanks again

@anassinator
Copy link
Owner

Cool! I just added it.

Yeah you can saturate whatever expression you want, the iLQR solver doesn't make any assumptions about that. You just need to make sure you saturate the output fitted path accordingly as well.

from ilqr.dynamics import tensor_constrain, constrain

# When building dynamics model...
u = tensor_constrain(u + x * t, min_bound, max_bound)

# When fitting...
xs, us = ilqr.fit(x0, us_init)
us = constrain(us + xs * t, min_bound, max_bound)

@anassinator
Copy link
Owner

Closing this for now. You can re-open if you have any more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants