Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trajectory optimization not stable #108

Open
yongxf opened this issue Aug 3, 2018 · 1 comment
Open

Trajectory optimization not stable #108

yongxf opened this issue Aug 3, 2018 · 1 comment

Comments

@yongxf
Copy link

yongxf commented Aug 3, 2018

Hi there,

Thanks for your excellent code. I am running your code using my own Mujoco model to do peg hole insertion with algorithm_traj_opt only, (No neural net yet). It seems the first 15 iterations is okay and the trajectory is converging.
However, things suddenly become worse after that. The Laplace estimation of the improvement produces a very large value, so the new eta grows very fast. Then the program crushes since Non-PD error happens.

I checked the iLQR paper. It seems there is no Laplace estimation. And the Qtt (combination of Qxx, Qxu, Quu) has a very different form with the equation you wrote in traj_opt_lqr_python.py. The iLQR paper I read is this: https://homes.cs.washington.edu/~todorov/papers/TassaIROS12.pdf

Can you let me know the paper of Laplace estimation implementation and the implementation paper of the iLQR you referred? Appreciate it!

@yongxf
Copy link
Author

yongxf commented Aug 4, 2018

The instability of iLQR comes from the eta update in iLQR.
The eta penalizes one of the KL divergence in iLQR, and is tuned by comparing kl_div with kl_step.
The problem is:

  1. when mc cost increases ==> new_mult < 1 ==> step decreases (since actual improvement becomes much smaller than predicted improvement and algorithm tries to reduce the step size)
  2. step decreases ==> con > 0 (since kl_step = step * kl_base, thus the theoretical bound becomes more strict. You refered kl_step in the code as epsilon, which is not correct since epsilon controls the other KL divergence term)
  3. con > 0 ==> eta increases (since more strict constraint on kl divergence makes current kl divergence violated the constraint, so more penalty will be added (i.k.i eta increases)

In summary:
when actual cost increases ==> penalization of KL divergence increases.

This is not reasonable, since more effort on KL divergence term will make the loss term becomes even more larger. After several iterations, the robot waived crazy.

The first several iterations is normal though. I guess the scaling of the improvement in new_multi calculation matters.

Any comment on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant